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Abstract 


We show that the first-order theory of structural subtyping 
of non-recursive types is decidable. 

Let ¥ be a language consisting of function symbols (rep- 
resenting type constructors) and C a decidable structure in 
the relational language L containing a binary relation <. C 
represents primitive types; < represents a subtype ordering. 
We introduce the notion of S-term-power of C, which gen- 
eralizes the structure arising in structural subtyping. The 
domain of the %-term-power of C’ is the set of 4-terms over 
the set of elements of C. 

We show that the decidability of the first-order theory of 
C implies the decidability of the first-order theory of the b- 
term-power of C’. This result implies the decidability of the 
first-order theory of structural subtyping of non-recursive 
types. 

Our decision procedure is based on quantifier elimination 
and makes use of quantifier elimination for term algebras 
and Feferman- Vaught construction for products of decidable 
structures. 

We also explore connections between the theory of struc- 
tural subtyping of recursive types and monadic second-order 
theory of tree-like structures. In particular, we give an em- 
bedding of the monadic second-order theory of infinite bi- 
nary tree into the first-order theory of structural subtyping 
of recursive types. 
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1 Introduction 


Subtyping constraints are an important technique for check- 
ing and inferring program properties, used both in type sys- 
tems and program analyses 
421) 171 64) (7 (8) Gs) 22) 47 (19). 

This paper presents a decision procedure for the first- 
order theory of structural subtyping of non-recursive types. 
This result solves (for the case of non-recursive types) a 
problem left open in [48]. [48] provides the decidability re- 
sult for structural subtyping of only unary type constructors, 
whereas we solve the problem for any number of constructors 
of any arity. Furthermore, we do not impose any constraints 
on the subtyping relation <, it need not even be a partial or- 
der. The generality of our construction makes it potentially 
of independent interest in logic and model theory. 

We approach the problem of structural subtyping using 
quantifier elimination and, to some extent, using monadic 
second-order logic of tree-like structures. This paper makes 
the contributions: 


e we give a new presentation of Feferman-Vaught theo- 
rem for direct products using a multisorted logic (Sec- 
tion 3.3); for completeness we also include proof of 
quantifier-elimination for boolean algebras of sets (Sec- 
tion [3.2); 


e we give a new presentation of decidability of the first- 
order theory of term algebras; the proof uses the lan- 
guage of both constructor and selector symbols (Sec- 


tion (3.4); 


e as an introduction to main result, we show decidability 
of structural subtyping with one covariant binary con- 
structor and two constants (Section |4), this result does 
not rely on Feferman-Vaught technique; 


e we present a new construction, term-power algebra for 
creating tree-like theories based on existing theories 
(Section [5); 


e as a central result, we prove that if the base theory 
is decidable, so is the theory of term-power with ar- 
bitrary variance of constructors; we give an effective 
decision procedure for quantifier elimination in term- 
power structure; the procedure combines elements of 
quantifier elimination in Feferman- Vaught theorem and 
quantifier elimination in term algebras (Sections [5] (6). 


e we show the decidability of structural subtyping non- 
recursive types as a direct consequence of the main re- 
sult; 


e we give a simple embedding of monadic second-order 
theory of infinite binary tree into the theory of struc- 
tural subtyping of recursive types with two primitive 
types (Section|-1) 

e we show that structural subtyping of recursive types 
where terms range over constant shapes is decidable 
(Section [7.4); 


In addition to showing the decidability of structural sub- 
typing, our hope is to promote the important technique of 
quantifier elimination, which forms the basis of our result. 

Quantifier elimination Section 2.7] is a fruitful tech- 
nique that was used to show decidability and classification 
of boolean algebras decidability of term algebras 


[31] Chapter 23], [39] [30], with membership constraints [10] 
and with queues [43], decidability of products [14], 
Chapter 12], and algebraically closed fields [50], 

The complexity of the decision problem for the first-order 
theory of structural subtyping has a non-elementary lower 
bound. This is a consequence of a general theorem about 
pairing functions [15] Theorem 1.2, Page 163] and applies to 
term algebras already, as observed in [43]. 


2 Preliminaries 


In this section we review some notions used in the this paper. 

If w is a word over some alphabet, we write |w| for the 
length of w. We write wi-wz to denote the concatenation 
of words w 1 and we. 

A node v in a directed graph is a sink if v has no outgoing 
edges. A node v in a directed graph is a source if v has no 
incoming edges. 

We write E; = E2 to denote equality of syntactic entities 
Ey and FE. 

We write Z to denote some sequence of variables 

LH1,--+-,XLn- 
We assume that formulas are built from propositional 
connectives A, V, 7, the remaining connectives are defined 
as shorthands. Connective — binds the strongest, followed 
by A and V. 

A literal E is an atomic formula A or a negation of an 
atomic formula =A. We define complementation of a literal 
by A=-7A and =A= A. 

A formula 7 is in prenex form if it is of the form 


Qizt. aoe Qntn-d 


where Q; € {V, 5} for 1 <i <n and ¢ is a quantifier free 
formula. We call ¢ a matrix of w. 

If ¢ is a formula then FV(¢) denotes the set of free vari- 
ables in @. 

We write [71 ++ a1,...,2~ + ax] for the substitution o 
such that o(#;:) =a; for 1 <i<k. 

If ¢ is a formula and f1,...,t, terms, we write ¢[x1 := 
t1,...,k := tr] for the result of simultaneously substituting 
free occurrences of variables x; with term t;, for 1 <i< k. 

We write h(t) for the height of term t. h(a) = 0 if a is 


a constant, h(x) = 0 if x is a variable. If f(ti,...,tx) isa 
term then 
h(f(ti,...,th)) = 1+ max(h(t1),...,h(te)) 


We assume that all function symbols are of finite arity. If 
there are finitely many function symbols then for any non- 
negative integer k there is only a finite number of terms t 
such that h(t) < k. 

If ¢(u) is a conjunction of literals, we say that ¢’ results 
from Ju.d(u) by dropping quantified variable wu iff ¢’ is the 
result of eliminating from ¢(u) all conjunctions containing 
u. More generally, if ~ is a formula of form 


Q1%1 a Qu. é -Qrere- wo 
then the result of dropping u from w is 
Qixi Ree Qre&r- wo 


where wo is the result of dropping u from Ju.do. 
An equality is an atomic formula t; = tz where t; and tz 
are terms. A disequality is negation of an equality. 


We use the usual Tarskian semantics of formulas. Unless 
otherwise stated ¢ | w will denote that formula ¢ > w 
is true in a fixed relational structure that is under current 
consideration. 

Occasionally we find it convenient to work with multi- 
sorted logic, where domain is union of disjoint sets called 
sorts, and arity specifies the sorts of all operations. Con- 
stants are operations with zero arguments. Relations are 
operations that return the result in a distinguished sort bool 
interpreted over the boolean lattice {false, true} or over the 
distributive lattice of three-valued logic {false, true, undef} 
from Section |2.3). 

A structure C of a given language L is a pair of domain 
C and the interpretation function LI?. Hence, we name op- 
erations of the structure using symbols of the language and 
the interpretation function. If C is clear from the context 
we write simply [_] for [_]°. 

In Section and Section [6] we use logic with several 
kinds of quantifiers. Our logic is first-order, but we give 
higher-order types to quantifiers. For example, a quantifier 


Q:(A—B)-B 


denotes a quantifier that binds variables of A sort enclosed 
within an expression of B sort and returns an expression of 
B sort. If X and Y are sets then X — Y denotes the set of 
all functions from A to B. When specifying the semantics 
of the quantifier Q@ we specify a function 


[Q] : [A] > [51) — [4] 


The semantics of an expression M of sort B takes an environ- 
ment o which is a function from variable names to elements 
of A and produces an element of B, hence [M]o € [B]. We 
define the semantics of an expression Qa. M by: 


[Qz. M]o = [Qlh 
where h : [A] — [B] is the function 
h(a) = [M](o[2 := al) 


o(y), if x 
ote =a) = { (y) gs 


a, ify=a2 


Here 


Specifying types for quantifiers allows to express more 

Let a4 be some arbitrary dummy global environment. If 
F is a formula without global variables we write [Fo to 
denote the truth value of F’; clearly |F]o4 does not depend 
on a4 and we denote it simply | F'] when no ambiguity arises. 

We use Hilbert’s epsilon as a notational convenience in 
metatheory. If P(x) is a unary predicate, then ex.P(x) de- 
notes an arbitrary element d such that P(d) holds, if such 
element exists, or an arbitrary object otherwise. 


2.1 Term Algebra 


We introduce the notion of term algebra Page 14]. 

Let Nat be the set of natural numbers. Let the signature 
&% be a finite set of function symbols and constants and let 
ar : & — Nat be a function specifying arity ar(f) for every 
function symbol or constant f € &. Let FT(X) denote the 
set of finite ground terms over signature ©. We assume that 
= contains at least one constant c € UN, ar(c) = 0, and at 
least one function symbol f € %, ar(f) > 0. Therefore, 
FT(%) is countably infinite. 


Let Cons() be the term algebra interpretation of signa- 
ture ¥, defined as follows Page 14]. For every f € © with 
ar(f) = k define [f] € Cons(X), with [f] : FT(=)* — FT() 


by 
[f](ti,..-,th) = f(ti,..., te) 


We will write f instead of [f] when it causes no confusion. 


2.2 Terms as Trees 


We define trees representing terms as follows. 

We use sequences of nonegative integers to denote paths 
in the tree. Let © be a signature. A tree over © is a partial 
function ¢ from the set Nat* of paths to the set © of function 
symbols such that: 


1. if w € Nat*, x € Nat, and t¢(w- x) is defined, then t(w) 
is defined as well; 


2. if t(w) = f with ar(f) =k, then 
{i | t(w - 2) is defined } = {1,...,k} 


A finite tree is a tree with a finite domain. 


2.3. First Order Structures with Partial Functions 


We make use of partial functions in our quantifier elimina- 
tion procedures. In this section we briefly describe the ap- 
proach to partial functions we chose to use; other approaches 
would work as well, see e.g. [24]. 

A language of partial functions ©; contains partial func- 
tion symbols in addition to total function symbols and rela- 
tion symbols. Consider a structure with the domain A inter- 
preting a language with partial function symbols 1. Given 
some environment o, we have [t]o ¢ AU{1L} where | ¢ A 
is a special value denoting undefined results. We require the 
interpretations of total and partial function symbols to be 
strict in L, ie. f(ai,...,ai:,L, @it2,...,Q%) =. 

We interpret atomic formulas and their negations over 
the three-valued domain {false, true, undef} using strong 
Kleene’s three-valued logic [44]. We require that 
JR] (a1,...,@i,-L, aiz2,..., ax) = undef for every relational 
symbol R. Logical connectives in Kleene’s strong three- 
valued logic are the strongest “regular” extension of the cor- 
responding connectives on the two-valued domain . The 
regularity requirement means that the three-valued logic is 
a sound approximation of two-valued logic in the following 
sense. We may obtain the truth tables for three-valued logic 
by considering the truth values false, true, undef as short- 
hands for sets {false}, {true}, {false, true} and defining each 
logical operation * by: 


81 |x] s2 = {b1 © be | b1 € 51 A be € 82} 


where o denotes the corresponding operation in the two- 
valued logic. As in a call-by-value semantics of lambda cal- 
culus, variables in the environments (a) do not range over 
1. We interpret quantifiers as ranging over the domain A 
or its subset if the logic is multisorted; the interpretation of 
quantifiers are similarly the best regular approximations of 
the corresponding two-valued interpretations. 

These properties of Kleene’s three-valued logic have the 
following important consequence. Suppose that we extend 
the definition of all partial functions to make them total 
functions on the domain A by assigning arbitrary values out- 
side the original domain. Suppose that a formula ¢ evaluates 


to an element of b € {false,true} in Kleene’s logic. Then ¢ 
evaluates to the same truth-value b in the new logic of total 
functions. This property of three-valued logic implies that 
the algorithms that we use to transform formulas with par- 
tial functions will apply even for the logic that makes all 
functions total by completing them with arbitrary elements 
of A. 

We say that a formula ~ is well-defined iff its truth value 
is an element of {false, true}. 


Example 1 Consider the domain of real numbers. The fol- 
lowing formulas are not well-defined: 


3=1/0 
Va.1l/e>0 V 1/a<0 V 1/2=0 


The following formulas are well-defined: 


da. 1/2 =3 
Va. 1/x #3 
x=0V 1/e>0 


We say that a formula ¢1 is equivalent to a formula ¢2 
and write ¢1 = ¢2 iff 


[gile = I¢2]o 


for all valuations o (including those for which [¢i]lo = 
undef). 

Sections below perform equivalence-preserving transfor- 
mations of formulas. This means that starting from a well- 
defined formula we obtain an equivalent well-defined for- 
mula. 

When doing equivalence preserving transformations it is 
useful to observe that A,V still form a distributive lattice. 
The partial order of this lattice is the chain false < undef < 
true. The element undef does not have a complement in 
the lattice; unary operation — does not denote the lattice 
complement. However, the following laws still hold: 


(a Ay) = 7a V ay 
(a Vy) = 7a A my 
a7n2=2 


The properties of A,V,- are sufficient to transform any 
quantifier-free formula into disjunction of conjunctions of lit- 
erals using the well-known straightforward technique. How- 
ever, this straightforward technique in some cases yields con- 
junctions that are not well-defined, even though the formula 
as a whole is well-defined. 


Example 2 Transforming a negation of well-defined for- 
mula: 
A(a@ AOA (y=1/“aVz=2+1)) 


may yield the following disjunction of conjunctions: 
cr=O0V(yAl/eAz#ux+l) 


where y 4 1/xA\z42+1 is not a well-defined conjunction 
for «= 0. 


To enable the transformation of each well-defined for- 
mula into a disjunction of well-defined conjunctions of liter- 
als, we enrich the language of function and relation symbols 
as follows. With each partial function symbol f € %1 of 
arity k = ar(f) we associate a domain description Ds = 
((t1,...,;@k),) specifying the domain of f. Here x1,..., 2% 
are distinct variables and ¢ is an unnested conjunction of 
literals such that FV(¢) C {a1,...,2%}. We require every 
interpretation of a first-order structure with partial function 
symbols to satisfy the following property: 


[f](a1,..-, an) = aol => ¢] [x1 H+ 1,...,UK + ap] 


for all ai,...,ax € A. We henceforth assume that every 
structure with partial functions is equipped with a domain 
description Dy for every partial function symbol /. 

The Proposition |8] below gives an algorithm for trans- 
forming a given well-defined formula into a disjunction of 
well-defined conjunctions. We first give some definitions and 
lemmas. 


Definition 3 [fw ts a formula with free variables, a do- 
main formula for w is a formula @ not containing partial 
function symbols such that, for every valuation o, 


[wo A undef [¢]o = true 
From Definition [3] we obtain the following Lemma |[4] 


Lemma 4 Let w be a formula and ¢ a domain formula for 
w. Then 


wy = (WA @) V (undef A 7¢) 


Proof. Let o be arbitrary valuation. Let v = [w~]o. If 
v € {true, false} then [d]o = true and 


[GA 4) v (undef A-¢)]o = 
(uv A true) V (undef A false) = v. 


If v = undef then [¢]] = false, so 


I(v A @) V (undef A 7¢)] a = 
(undef A false) V (undef A true) = undef. 


Observe that ~A¢@ in LemmafJ]is a well-defined conjunc- 
tion. We use this property to construct domain formulas 
using partial function domain descriptions. 

Let F ; 

Dy = ((@1,---,€k), By A... A Bi;) 
for each partial function symbol f € %1 of arity k, where 
Bi, us ig Be are unnested literals. If t1,...,t, are terms, 
we write Bi (t, ...,tp) for Bila. r= t1,...,0% := ty]. Let 
subt(t) denote the set of all subterms of term t. 


For any literal B(ti,...,tn) where B(ti,... 
R(ti,...,tn) or B(ti,..-,;tn) =7R(ti,... 


? tn) = 
tn), define 


DomForm(B(ti,..-,tn)) = 
\ BS (ig.559y) (1) 


f(815++-58h)E€U1<i<nsubt(t;) 


i<j<if 


Lemma 5 Let B(ti,...,tn) be a literal containing partial 
function symbols. Then DomForm(B(ti,...,tn)) is a do- 
main formula for B(ti,...,tn). 


Proof. Let o be a valuation. By strictness of interpretations 
of function and predicate symbols, [B(ti,...,tn)ljo 4 undef 
iff [f(s1,...,5x)Jo A L for every subterm f(s1,...,5%) of 
every term ¢;, iff [Bi (s1, ...,8~)]o = true for every 1 <j < 
i and every subterm f(s51,...,S%). 


Lemma 6 Let B be a literal and let 
DomForm(B) = Fi A... A Fin. 
Then 
Be (BAF,A...A\Fm) V 
Vi<i<m (undef A —F; A DomForm(F;)) 


Proof. If [Blo # undef, then [FiJo = true for every 
1<i<m, and 


[undef A =F; \ DomForm(F;)]o = false 


so the right-hand side evaluates to [B]o as well. Now 
consider the case when [B]o = undef. Then there exists 
a term f(s1,...,8%) such that [f(s1,...,sx)Jo = undef. 
Because o(x) # tL for every variable x, there exists a 
term f(s1,...,5%) such that [f(s1,...,sx)Jo = undef and 
[siJo # undef for 1 <i<k. Then there exists a formula F, 
of form Be (ai; ...; 8) such that [Bi (s1, ...;8k)]o = false, 
and 
[undef A =F, A DomForm(F;,)|}o = undef. 


Because 
[BA FL A...A Fm]o = false, 


and for every q, 
[undef A =F, A DomForm(F,)]o € {undef, false}, 


the right-hand side evaluates to undef. m= 


Lemma 7 Let ¢0(¥) and ¢1(y) be well-defined formulas 
whose free variables are among y and let 


wy) = (undef A do0(¥)) V o1(9) 
If w(y) is well-defined for all values of variables y, then 
vy) = oY) 
Proof. Consider any valuation o. Let 
v= [alo 
and 
v' = [v(a)]e 


We need to show v = v’. Because $(y) and 7(y) are well- 
defined, v,v’ € {false, true}. We consider two cases. 

Case 1. v =true. Then also v’ = true. 

Case 2. v = false. Then v’ = undef A ¢o(y). Because 
v’ # undef, we conclude v’ = false. = 


Proposition 8 Every well-defined quantifier-free formula 
w can be transformed into an equivalent disjunction w' of 
well-defined conjunctions of literals. 


Proof. Using the standard procedure, convert w to dis- 
junction of conjunctions 


C1 V...V Cn 


Let C; = BAC} where B is a literal and let DomForm(B) = 
Fi A... Fm. Replace B A C} by 


(BAF, A...A Fm A Ci) V 
Vi <icm (undef A =F; A DomForm(F;) A C7) 
By Lemma |6] and distributivity, the result is an equivalent 


formula. Repeat this process for every literal in Ci V...VCh. 
The result can be written in the form 


(undef A d1) V... V (undef A dp) V dp41V-.-V dptq (2) 


where each ¢; for 1 < i < p+q is a well-defined conjunction. 
Formula is equivalent to 


(undef A ($1 V...V dp)) V bp4i V--- V bpta (3) 


and is equivalent to the well-defined formula w, so it is well- 
defined. Formulas ¢1 V...V ¢@p and p41 V... V dptq are 
also well-defined. By Lemma [7] we conclude that formula 
is equivalent to 


Pp+1 Viv Pp+a (4) 
Because is a disjunction of well-defined formulas, is 
the desired result 7)’. = 


The following proposition presents transformation to 
unnested form for the structures with equality and partial 
function symbols, building on Proposition [8] For a similar 
unnested form in the first-order logic containing only total 
function symbols, see Page 58]. 


Proposition 9 Every well-defined quantifier-free formula 
w in a language with equality can be effectively transformed 
into an equivalent formula ~' where ~" is a disjunction of 
existentially quantified well-defined conjunctions of the fol- 
lowing kinds of literals: 


e R(x1,...,2~) where R is some relational symbol of ar- 
ity k and x1,...,2% are variables; 


e AR(x1,...,¢h) where R is some relational symbol of 
arity k and x1,...,2% are variables; 


e@ 41 = x2 where x1, x2 are variables; 


e c= f(x1,...,2%) where f is some partial or total func- 
tion symbol of arity k and x,21,...,2~% are variables; 


e 2&1 #22 where x1 and x2 are variables. 


Proof. Transform the formula to disjunction of well-formed 
conjunctions of literals as in the proof of Proposition [8] 

Then repeatedly perform the following transformation 
on each well-defined conjunction ¢. Let A(f(x1,...,vx)) be 
an atomic formula containing term f(x1,...,7%). Replace 
oA A(f(a1,---,0r)) with 


Xo. oA xo = f(x1,... 


WwW 


:t%) \ A(x) 


Replace « # f(a1,...,%%) with 


vo = f(xvi,...,h) N to #2 

Repeat this process until the resulting conjunction ¢’ is in 
unnested form. ¢’ is clearly equivalent to the original con- 
junction ¢ when all partial functions are well-defined. When 
some partial function is not well-defined, then both ¢ and ¢ 
evaluate to false, because by construction of ¢ in the proof 
of Proposition [8] each conjunction contains conjuncts that 
evaluate to false when some application of a function symbol 
is not well-defined. m= 


Let a left-strict conjunction in Kleene logic be denoted 
by A’ and defined by 


p\' a= (pq) V (pAnp) 


The correctness of the transformation to unnested form 
in Proposition [9]relies on the presence of conjuncts that en- 
sure that the entire conjunction evaluates to false whenever 
some term is undefined. The following Lemma [10] enables 
transformation to unnested form in an arbitrary context, al- 
lowing the transformation to unnested form to be performed 
independently from ensuring well-definedness of conjuncts. 


Lemma 10 Let ¢(x) be a formula with free variable x and 
let t be a term possibly containing partial function symbols. 
Then 


1. g(t) = (dv. x=t A O(x)) V (undef AVa.n¢(x)) ; 


iS) 
S 
=< 
an 
— 
II 
LL 


de. 2 =tA' o(2) ; 


3. o(t) (ax.c=t A d(x)) V (tAt). 
Proof. Straightforward. = 


IIe 


Proposition [13] below shows that a simplification similar 
to one in Lemma |7|can be applied even within the scope 
of quantifiers. To show Proposition [13] we first show two 
lemmas. 


Lemma 11 For all formulas ¢o(x,¥) and ¢1(2, 9), 


LW 


r. (undef A $o(#,9)) V d1(0,9) © 
(undef A Ax.do(x, ¥)) V Ax.¢1(2, 9) 


Proof. By distributivity of quantifiers and propositional 
connectives in Kleene logic we have: 


x. (undef A go(x,9)) V di(z, 9) & 
da.undef A ¢o(x, ¥)) 


) 


WwW 


( Va 
( Va 


y) 
9) 


undef A Az.¢o0(x, 


Lemma 12 For all formulas do(x,¥) and ¢1(2, 9), 


Va. (undef A do(x,¥)) V di(a, 9) & 
(undef A Va.g0(x, 9) V $1(x,9)) V Ve-d1(2, 9) 


Proof. The following sequence of equivalences holds. 


Va. (undef A ¢o(x,9)) V di(z, 9) = 

adax.7(undef A do(x,9)) V g1(@,9) & 

ada. (undef V ad0(%, ¥)) A 7¢1(x, 9) & 

7da. (undef A 461 (2,9)) V (0(#,9) A 791(@,9)) = 

+ ((undef A 32.741 (2, 9)) V (Be. -40(,9) A701 (0,5)) © 
(undef V Vx.d1(,9)) V (Vx. bo(#, 9) V o1(2,9)) & 

(undef A Vx.do(«, 9) V o1(a, 9)) V Vu.d1 (2, 9) 


Proposition 13 Let $0(%,y) and ¢1(Z,Y) be well-defined 
formulas whose free variables are among y and let 


VY) = Qiti...Qnan. (undef A do(Z,¥)) V o1(2, 9) 
where Qi,...,Qn are quantifiers. If ~(Y) is well-defined for 


all values of variables y, then 


wy) > Qit1...Qn@n.- $1(Z, Y) 


Proof. Applying successively Lemmas and }12/to quan- 
tifiers Qn,.--,Q1, we conclude 


wy) = (undef A b2(H)) V Q1a1..-Qnan- O1(Z, 9) 
for some formula ¢2(y). Then by Lemma|7| 


wy) = Qit1...Qn&n. $1 (Z, 9). 


3 Some Quantifier Elimination Procedures 


As a preparation for the proof of the decidability of term 
algebras of decidable theories, we present quantifier elimina- 
tion procedures for some theories that are known to admit 
quantifier elimination. We use the results and ideas from 
this section to show the new results in Sections [4] [5] [6] 


3.1 Quantifier Elimination 


Our technique for showing decidability of structural sub- 
typing of recursive types is based on quantifier elimination. 
This section gives some general remarks on quantifier elim- 
ination. 

We follow in describing quantifier elimination proce- 
dures. According to Page 70, Lemma 2.7.4] it suffices 
to eliminate dy from formulas of the form 


ay. /\ vi(%,y) (5) 


O<i<n 


where Z is a tuple of variables and w;(Z, y) is a literal whose 
all variables are among Z,y. The reason why eliminating 
formulas of the form (5) suffices is the following. Suppose 
that the formula in prenex form and consider the innermost 
quantifier of a formula. Let ¢ be the subformula containing 
the quantifier and the subformula that is the scope of the 
quantifier. If @ is of the form Vz. ¢@9 we may replace ¢ 
with =dx.4¢0. Hence, we may assume that ¢ is of the form 


da. ¢1. We then transform ¢1 into disjunctive normal form 
and use the fact 


da. (d2 V $3) ——7 (Aa. o2) V (Aa. $3) (6) 


We conclude that elimination of quantifiers from formulas of 
form suffices to eliminate the innermost quantifier. By 
repeatedly eliminating innermost quantifiers we can elimi- 
nate all quantifiers from a formula. 

We may also assume that y occurs in every literal y,, 
otherwise we would place the literal outside the existential 
quantifier using the fact 


dy. (AA B) = (ay.A)AB 


for y not occurring in B. 
To eliminate variables we often use the following identity 
of a theory with equality: 


WwW 


vc=tAdo(r) <> o(t) (7) 


Section presents analogous identities for partial func- 
tions. 

Quantifier elimination procedures we give imply the de- 
cidability of the underlying theories. In this paper the inter- 
pretations of function and relation symbols on some domain 
A are effectively computable functions and relations on A. 
Therefore, the truth-value of every formula without vari- 
ables is computable. The quantifier elimination procedures 
we present are all effective. To determine the truth value of 
a closed formula ¢ it therefore suffices to apply the quan- 
tifier elimination procedure to ¢, yielding a quantifier free 
formula w, and then evaluate the truth value of w. 


3.2 Quantifier Elimination for Boolean Algebras 


This section presents a quantifier elimination procedure for 
finite boolean algebras. This result dates back at least to 
[46], see also [51) [27] [32) (6) [49], Section 2.7 Exercise 3]. 
Note that the operations union, intersection and comple- 
ment are definable in the first-order language of the subset 
relation. Therefore, quantifier elimination for the first-order 
theory of the boolean algebra of sets is no harder than the 
quantifier elimination for the first-order theory of the sub- 
set relation. However, the operations of boolean algebra are 
useful in the process of quantifier elimination, so we give the 
quantifier elimination procedure for the language containing 
boolean algebra operations. 

Instead of the first-order theory of the subtype relation 
we could consider monadic second-order theory with no re- 
lation or function symbols. These two languages are equiv- 
alent because the first-order quantifiers can be eliminated 
from monadic second-order theory using the subset relation 
(see Section|7.1). 

Finite boolean algebras are isomorphic to boolean alge- 
bras whose elements are all subsets of some finite set. We 
therefore use the symbols for the set operations as the lan- 
guage of boolean algebras. t1Mt2, ti Ute, t{, 0, 1, correspond 
to set intersection, set union, set complement, empty set, 
and full set, respectively. We write t; C te for ti N te = th, 
we write t; C te for the conjunction t; Cte A ti 4 te. 

For every nonnegative integer k we introduce formulas 
|t| > k& expressing that the set denoted by ¢ has at least 
k elements, and formulas |t| = k expressing that the set 


denoted by t has exactly k elements. These properties are 
first-order definable as follows. 


|t] >0 = true 
jt] >k4+1 = dxr.xct a |al>k 
Hl=k = |e >h A Ie See 


We call a language which contains terms |t| > k and |t| =k 
the language of boolean algebras with finite cardinality con- 
straints. Because finite cardinality constraints are first-order 
definable, the language with finite cardinality constraints is 
equally expressive as the language of boolean algebras. 

Every inequality ti C te is equivalent to the equality 
ti Nt2 = ti, and every equality t3 = ta is equivalent to the 
cardinality constraint 


|(t3 N t4) U (t4Nt§)| =0 


It is therefore sufficient to consider the first-order formulas 
whose only atomic formulas are of the form |t| = 0. For 
the purpose of quantifier elimination we will additionally 
consider formulas that contain atomic formulas |¢|=k for all 
k > 1, as well as |t|>k for k > 0. 

Note that we can eliminate negative literals as follows: 


|t| = k 
a|t\]|>k = 


jt] =O V---V |t] =k-1 V |t| > k41 
jt] =O V---V |t] =k-1 


(8) 
Every formula in the language of boolean algebras can there- 
fore be written in prenex normal form where the matrix of 
the formulas is a disjunction of conjunctions of atomic for- 
mulas of the form |t| = & and |t| > k, with no negative 
literals. 

Note that if a term t contains at least one operation of 
arity one or more, we may assume that the constants 0 and 
1 do not appear in t, because 0 and 1 can be simplified away. 
Furthermore, the expression |0| denotes the integer zero, so 
all terms of form |0| = & or |0| > & evaluate to true or false. 
We can therefore simplify every nontrivial term t so that 
it either t contains no occurrences of constants 0 and 1, or 
t=1. 

We next describe a quantifier elimination procedure for 
finite boolean algebras. 

We first transform the formula into prenex normal form 
and then repeatedly eliminate the innermost quantifier. As 
argued in Section [3.1] it suffices to show that we can elimi- 
nate an existential quantifier from any existentially quanti- 
fied conjunction of literals. Consider therefore an arbitrary 
existentially quantified conjunction of literals 


dy. \ wilZ, y) 


1<i<n 


where 7; is of the form |t| = k or of the form |t| > k. We 
assume that y occurs in every formula 7;. It follows that no 
w; contains |0| or |1]. 

Let 1,...,%m,y be the set of variables occurring in for- 
mulas y; for 1 <i<n. 

First consider the more general case m > 1. Let for 
t1,---,4m € {0,1}, 

tiycim = UO Nay 


vim, 
where ¢° = ¢ and t' = ¢°. The terms in the set 


P = {tiz.im | t1,---,4m € {0, 1}} 


original formula eliminated form 
dy. |sNyl>kAl[sny]>l Js[ >kK+1 
dy. |sNyl =kA|sny | >l js] >k+1 
dy. |sny| > kA|sny*| =l js] >k+1 
dy. |sNy| =kA|sny*| =l js) =k+1 


Figure 1: Rules for Eliminating Quantifiers 


form a partition; moreover every boolean algebra expression 
whose variables are among x; can be written as a disjoint 
union of some elements of the partition P. Any boolean 
algebra expression containing y can be written, for some 
p,q 2 0as 

(siNy)U---U(spNy)U 


(aN y%)U---U(tgny®) 


where s1,...,5,) € P are pairwise distinct elements from the 
partition and t1,...,t, € P are pairwise distinct elements 
from the partition. Because 


(si y)U---U(spNy)UENY)U---U (tg Ny) = 
IsiNylt---+[spOyl+ lay] +--+ [tan y*| 


the constraint of form |t| = k can be written as 


Vy Is My| = ki A+++ Asp Oy] = kp A 
Roky tesla tel mye] = ly A+++ A [tg Ny°| = lp 
where the disjunction ranges over nonnegative integers 
ki,...,kp,h,...,lg > 0 that satisfy 


kites:thpth+--+lg=k 


From (8) it follows that we can perform a similar transfor- 
mation for constraints of form |t| > k. After performing this 
transformation, we bring the formula into disjunctive nor- 
mal form and continue eliminating the existential quantifier 
separately for each disjunct, as argued in Section [3-1] We 
may therefore assume that all conjuncts 7; are of one of the 
forms: |sNy| =k, |sny°|=k, |sNy| >k, and |sny°|>k 
where s € P. 

If there are two conjuncts both of which contain |sMy| for 
the same s, then either they are contradictory or one implies 
the other. We therefore assume that for any s € P, there is 
at most one conjunct y; containing |sM y|. For analogous 
reasons we assume that for every s € P there is at most one 
conjunct ~; containing |sM y°|. The result of eliminating 
the variable y is then given in Figure [I] The case when a 
literal containing |sM y| does not occur is covered by the 
case |sM y| > k for k = 0, similarly for a literal containing 
Isny*|. 

It remains to consider the case m = 0. Then y is the 
only variable occurring in conjuncts ~;. Every cardinality 
expression t containing only y reduces to one of |y| or |y*|. 
If there are multiple literals containing |y|, they are either 
contradictory or one implies the others. We may therefore 
assume there is at most one literal containing |y| and at 
most one literal containing |y°|. We eliminate quantifier by 
applying rules in F igure[1] putting formally s = 1 where 1 is 
the universal set. 


This completes the description of quantifier elimination 
from an existentially quantified conjunction. By repeating 
this process for all quantifiers we arrive at a quantifier-free 
formula yw. Hence we have the following theorem. 


Theorem 14 For every first-order formula @ in the lan- 
guage of boolean algebras with finite cardinality constraints 
there exists a quantifier-free formula w such that w is a dis- 
junction of conjunctions of literals of form |t| > k and |t| =k 
where t are terms of boolean algebra, the free variables of w 
are a subset of the free variables of , and wW is equivalent to 
@ on all algebras of finite sets. 


Remark 15 Now consider the case when formula ¢ has no 
free variables. By Theorem[14] ¢ is equivalent to 7 where w 
contains only terms without variables. A term without vari- 
ables in boolean algebra can always be simplified to 0 or 1. 
Because |0| = 0, the literals with |0| reduce to true or false, 
so we may simplify them away. The expression |1| evaluates 
to the number of elements in the boolean algebra. We call 
literals |1| = k and |1| > k domain cardinality constraints. A 
quantifier-free formula w can therefore be written as a propo- 
sitional combination of domain cardinality constraints. We 
can simplify ~ into a disjunction of conjunctions of domain 
cardinality constraints and transform each conjunction so 
that it contains at most one literal. The result w’ is a sin- 
gle disjunction of domain cardinality constraints. We may 
further assume that the disjunct of form |1| > k occurs at 
most once. Therefore, the truth value of each closed boolean 
algebra formula is characterized by a set C' of possible cardi- 
nalities of the domain. If w’ does not contain any |1| > k lit- 
erals, the set C is finite. Otherwise, C = CoU{k,k+1,...} 
for some k where Co is a finite subset of {1,...,4 — 1}. 


3.3. Feferman-Vaught Theorem 


The Feferman-Vaught technique is a way of 
discovering the first-order theories of com- 
plex structures by analyzing their components. 
This description is a little vague, and in 
fact the Feferman-Vaught technique itself has 
something of a floating identity. It works 
for direct products, as we shall see. Clever 
people can make it work in other situations too. 


— [22], page 458 


We next review Feferman-Vaught theorem for direct 
products [14] which implies that the products of structures 
with decidable first-order theories have decidable first-order 
theories. 

The result was first obtained for strong and weak pow- 
ers of theories in [35]; [35] also suggests the generalization 
to products. Our sketch here mostly follows and [35], 
see also Chapter 12] as well as Section 9.6]. Some- 
what specific to our presentation is the fact that we use a 
multisorted logic and build into the language the correspon- 
dence between formulas interpreted over C' and the cylindric 
algebra of sets of positions. 

Let Lo be a relational language. Let further J be some 
nonempty finite or countably infinite index set. For each 
i € I let Ci = (Ci, [-]©) be a decidable structure interpreting 
the language Lc. 

We define direct product of the family of structures Ci, 
i € I, as the structure 


P = Wier Ci 


where P = (P,[_]”). P is the set of all functions t such that 
t(i) € C; for i € I, and [_]” is defined by 


Ir] (4,...,te) = Vi. [r] (10), ..., ted) 


for each relation symbol r € Lo. 


inner formula relations for r € Lo 
ro: tuple* — indset 


inner logical connectives 


Al,v! os: indset x indset — indset 
— oo: indset — indset 
true!,false! ::  indset 


inner formula quantifiers 


q',V' os: (tuple — indset) — indset 


index set equality 


= :: indset x indset — bool 


logical connectives 


A.V: bool x bool — bool 
— :: bool > bool 
true, false :: bool 


index set quantifiers 


qt,v' os: (indset + bool) — bool 


tuple quantifiers 


,V os: (tuple — bool) > bool 


Figure 2: Operations in product structure 


For the purpose of quantifier elimination we consider a 
richer language of statements about product structure P. 
Figure |2| shows this richer language. The corresponding 
structure P2 = (Ps, [_]*2) contains, in addition to the func- 
tion space P, a copy of the boolean algebra 2/ of subsets of 
the index set J. We interpret a relation r € Lo by 


Ir]? (4, .--, te) ={4| Pr] (id), ..., te(@)) } 
We let [true']72 = I and write 
r(ti,..+ tr) — true! 


to express [rT]? (t1,... 
as P. 

Note that Figure [2] does not contain an equality relation 
between tuples. If we need to express the equality between 
tuples, we assume that some binary relation ro € Lc in the 
base structure is interpreted as equality, and express the 
equality between tuples t; and tz using the formula: 


, tx). Hence P, is at least as expressive 


ro(ti, to) = true. 


F igure[3|shows the semantics of the language in Figure[2| 
(The logic has no partial functions, so we interpret the sort 
bool over the set {true, false}.) 


inner formula relations for r € Lo 


[r] 2 (t,-.-,te) = {4| [J (a@),...,te@))} 


inner logical connectives 


[\'}? (41, 42) = Ar A Ad 
[Vi (41,42) = ArU Ap 
["](A) = I\A 
[true]? = I 
[false']72 = 90 


inner formula quantifiers 
BI?r = Uerf® 
IV]? F = (ier f(t) 
index set equality 
aia (A1,A42) = (Ai = Ao) 


logical connectives 


(interpreted as usual) 

index set quantifiers 
BT? Useo f(A) 
Iv vay — Nacat f(A) 


tuple quantifiers 
yep = aAe¢P. f(t) 
[v2 F Vt € P. f(t) 


Figure 3: Semantics of operations in product structure P2 


We let Ai i Ao stand for A; A! Ap = =! oo 

ote that the interpretations of A', v', —!, true’, false’, 

‘ V' form a first-order structure of pacléan algebras of 
subsets of the set J. We call formulas in this boolean algebra 
sublanguage indez-set algebra formulas. 

On the other hand, relations r for r € Lo, together with 
A', V', al, S!, Vv! form the signature of first-order logic with 
relation symbols. We call formulas built only from these 
operations inner formulas. 

Let ¢ be a an inner formula with free tuple variables 
ti,...,¢m and no free indset variables. Then ¢ specifies a 
relation p C D™. Consider the corresponding first-order 
formula ¢’ interpreted in the base structure C; formula ¢’ 
specifies a relation p’ C C™. The following property follows 
from the semantics in Figure [3] 


tm) ={i ET p'(ta(a),.--,tm())} (9) 


V' are only applied 
By la- 
’ An, 


Z 


ah 


Lu 


plti,..., 


Sort constraints imply that quantifiers 3!, 
to inner formulas. Let ¢ be a formula of sort bool. 
belling subformulas of sort indset with variables Aj,... 
we can write @ in form ¢’: 


FbAy,..., An. 
Aj =! di IN saaw IX An =! On AN 
W(Ai,...,An) 
where 
= W(d1,---; dn) 
Furthermore, by defining B,,..., Bm to be the partition of 


true! consisting of terms of form 


AO ODN ch ee 
for pi,.--,;Pn € {0,1}, we can find a formula y’ and formulas 
$41, +++,@m Such that ¢' is equivalent to $7: 
3B,,...,Bm. 
By Ol AaiA Ba= Gah (10) 
w'(Bi,..-, Bn) 


and where ¢/,..., 4, evaluate to sets that form partition of 
true! for all values of free variables. (By partition of true! we 
here mean a family of pairwise disjoint sets whose union is 
true', but we do not require the sets to be non-empty. ) 

Now consider a formula of form it.¢ where ¢ is with- 
out 4,V quantifiers (but possibly contains 3'v' and at,v 
quantifiers). We transform ¢ into ¢* as described, and then 
replace 


at. S'Bi,...,Bm.- 
Bi =| 6, A...A\ Bm =' bin A (11) 
i! (Bay :+39Ba) 
with 

3D,,...,Dm. 3Bi,...,Bm. 

D, =| (A't.d) A... A Dm =! (A't-b'n) A (12) 

B,C’ Di A... Bm C! Dm A 

partition(B,, Bn) Aw'(Bi,..-,Bn) 
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, B,,) denotes a boolean algebra ex- 
, Bn form the partition 


where partition(B:,... 
pression expressing that sets Bi,... 
of true’. 

It is easy to see that [11] and [12] are equivalent. 

By repeating this construction we eliminate all term 
quantifiers from a formula. We then eliminate all set quan- 
tifiers as in Section [3.2] For that purpose we extend the 
language with cardinality constraints. 

As the result we obtain cardinality constraints on inner 
formulas. Closed inner formulas evaluate to true! or false! 
depending on their truth value in base structure C. Hence, 
if C is decidable, so is P2. 


Theorem 16 (Feferman-Vaught) Let C be a decidable 
structure. Then every formula in the language of Figure[Q]is 
equivalent on the structure P2 to a propositional combination 
of cardinality constraints of the index-set boolean algebra i.e. 
formulas of form |¢| > k and |¢| = k where ¢ is an inner 
formula. 


Example 17 Let r € Lo be a binary relation on structure 
C. Let us eliminate quantifier 4t from the formula ¢(t1, tz): 


3t.3¢.A1, Ao, As. 
A; ='r(t,t1) A Ag ='r(t 
|n'Ai] =0 A |'Ad| = 


,t) A Ag ='r(to,t) A 


0 A| HA3| > 1 


We first introduce sets Bo,...,B7 that form partition of 
true’. The formula is then equivalent to ¢1: 


3t.3' Bo, B1, Bo, Bs, Ba, Bs, Be, Br. 
Bo =' r(t, ti) A' on A' r(ta,t) A 
By =| 'r(t,t1) A' r(t1,t) A' r(t2, t) A 
By =| r(t,t1) A em A' r(t2,t) A 
Bs =| -'r(t,t1) A! —'r(t1, t) A' r(ta, t) A 
Ba =' r(t, ti) \' oe be t) A 
Bs ='—'r(t,t1) A' r(ti, t) A' 'r(te, t) A 
Be =' r(t, ti) A' a'r(ti, t) A' 'r(te, t) A 
By, =| -'r(t,t1) A' s'r(t1, t) A' a'r(ta, t) A 
go 

where 

go = 


|Bi]=0 A |Boa]=O0A 
|B3| =0 A |Bs|=0A 
|Beo|] =0 A |Br|=OA 
|Ba| > 1 


We now eliminate the quantifier 4t from the formula ¢1, 


obtaining formula ¢2: 


3! Do, D1, D2, D3, Da, Ds, De, Dr. 

Do =' S't. r(t,t1) A' r(ti, t) A' r(te,t) A 

D, =' S't. —'r(t, t1) A' v(t, t) A! r(te, t) A 

Dy =' S't. r(t,t1) A' a'r (t,t) A' r(te, t) A 

Dz =' S't. 'r(t,t1) A! a'r(ti, t) A' r(te, t) A 

Da =' S't. r(t,t1) A' r(ti,t) A hig t) A 

Ds =' S't. —'r(t, ta) A' r(ta, t) A! a'r(te, t) A 

De =' S't. r(t,t1) A' 'r(t1, t) A! a'r(to, t) A 

Dy =' S't. 'r(t, ti) A! a'r(t1, t) A! a'r(te, t) A 

3 

where 
o3 = 3! Bo, Bi, Bo, Bs, Ba, Bs, Bo, Br. 

Bo GC’ Do A...A By C1 D7 A 
o 


We next apply quantifier elimination for boolean algebras 
to formula ¢3 and obtain formula $3: 


$3 = |Dal>1 A |'Do A'-'Dal = 0 


Hence ¢(t1, t2) is equivalent to 


“Do, Da. 


gS 
II 
ul 


St. r(t,t1) Al r(tr, t) A' r(te,t) A 


Da =' S't. r(t,t1) A' r(t1, t) A' a'r (te, t) A 
|Da| > 1A |-'Do A''Da| = 0 


After substituting the definitions of Do and D., formula 
$(ti,t2) can be written without quantifiers 3,V,3',V'. 


4 


3.4 Term Algebras 


In this section we present a quantifier elimination procedure 
for term algebras (see Section|2.1). A quantifier elimination 
procedure for term algebras implies that the first-order the- 
ory of term algebras is decidable. In the sections below we 
build on the procedure in this section to define quantifier 
elimination procedures for structural subtyping. 

The decidability of the first-order theory of term alge- 
bras follows from Mal’cev’s work on locally free algebras 
Chapter 23]. also gives an argument for decid- 
ability of term algebra and presents a unification algorithm 
based on congruence closure [38]. Infinite trees are studied 
in [12]. presents a complete axiomatization for algebra 
of finite, infinite and rational trees. A proof in the style of 
for an extension of free algebra with queues is presented 
in |43]. Decidability of an extension of term algebras with 
membership tests is presented in in the form of a termi- 
nating term rewriting system. Unification and disunification 


11 


problems are special cases of decision problem for first-order 
theory of term algebras, for a survey see e.g. [9]. 

We believe that our proof provides some insight into 
different variations of quantifier elimination procedures for 
term algebras. Like [22] we use selector language symbols, 
but retain the usual constructor symbols as well. The ad- 
vantage of the selector language is that Jy. z = f(x,y) is 
equivalent to a quantifier-free formula x = fi(z) A Isy(z). 
On the other hand, constructor symbols also increase the 
set of relations on terms definable via quantifier-free formu- 
las, which can slightly simplify quantifier-elimination pro- 
cedure, as will be seen by comparing Proposition and 
Pena Compared to [22] Page 70], we find that the 
termination of our procedure is more evident and the ex- 
tension to the term-power algebra in Section [6] easier. Our 
base formulas somewhat resemble formulas arising in other 
quantifier elimination procedures [31] [11] [30]. Our terminol- 
ogy also borrows from congruence closure graphs like those 
of [39] [38], although we are not primarily concerned with 
efficiency of the algorithm described. Term algebra is an ex- 
ample of a theory of pairing functions, and [15] shows that 
non-empty family of theories of pairing functions as non- 
elementary lower bound on time complexity. 


3.4.1 Term Algebra in Selector Language 


To facilitate quantifier elimination we use a selector lan- 
guage Sel(X) for term algebra Page 61]. We define term 
algebra in selector language as a first-order structure with 
partial functions. 

The set Sel(X)) contains, for every function symbol f € 
y of arity ar(f) = k, a unary predicate Isp GC FT(X) and 


functions fi,..., fx :FT(2) — FT(%) such that 
Is p(t) SS bial t= f(t,...,tn§13) 
fi(f(ti,---,th)) = ti, l<i<k (14) 
fit) = 1, — alss(t) (15) 


For every f € © and 1 <i < ar(f), expression f;(t) defined 
iff Is¢(t) holds, so we let Dz = = (a, Isp(x)). 

As a special case, if d is a constant, then ar(d) = 0 and 
Isa(t) => t=d. 


Proposition 18 For every formula ¢1 in the language 
Cons(X)) there exists an equivalent formula 2 in the selector 
language. 


Proof Sketch. Because of the presence of equality sym- 
bol, every formula in language Cons() can be written in 
unnested form such that every atomic formula is of two 
forms: 21 = Xo, or f(@1,...,@%) = y, where y and 2; are 
variables. We keep every formula 71 = x2 unchanged and 
transform each formula 


Ff @isex; 2k) =a. 
into the well-defined conjunction 
v1 = fily) A---A we = fly) A Ise(y) 


Note that predicates Is; form a partition of the set of all 
terms i.e. the following formulas are valid: 


Va. V Is p(x) 
fex 


Va. a(Isp¢(x) A Isg(x)), 


(16) 
for f#g 


Proposition 
quantifier-free disjunction of 
formula base formulas 
Proposition 


Figure 4: Quantifier Elimination for Term Algebra 


A constructor-selector language contains both construc- 
tor symbols f € Cons() and selector symbols f; € Sel(). 


3.4.2 Quantifier Elimination 


We proceed to quantifier elimination for term algebra. A 
schematic view of our proof is in Figure The basic in- 
sight is that any quantifier-free formula can be written in a 
particular unnested form, as a disjunction of base formulas. 
Base formulas trivially permit elimination of an existential 
quantifier, yet every base formula can be converted back to 
a quantifier-free formula. 

A semi-base formula is almost the base formula, except 
that it may be cyclic. We introduce cyclicity after explaining 
the graph representation of a semi-base formula. 


Definition 19 (Semi-Base Formula) A semi-base_for- 
mula 3 with 


e free variables 41,...,%m, 
e internal non-parameter variables u1,...,Up, and 
e internal parameter variables Up+1,...,Uptq 
is a formula of form 
dui, see Un 
distinct(ui,...,Un) A 
structure(wi,...,Un) A 
labels(u1,...,Unj21,---,;2m) 
distinct(ui,...,un) enforces that variables are distinct 
distinct(ui,...,Un) = \ Ui FU; . 
1<i<j<n 
structure(ui,...,Un) specifies relationships between terms 
denoted by variables: 
structure(ui,...,Un) = 
P 
\ uu = ti(ua, ina ,Un) 


1 


where each ti(u1,...,Un) ts a term of form f(ui,,.. 
for f EX, k=ar(f). 

labels(ui,...,Unj21,---,L%m) identifies some free vari- 
ables with some parameter and non-parameter variables: 


- Ut,) 


labels(ui,...,Unj@1,--- 
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for some function j : {1,...,m}— {1,...,n}. 

We require each semi-base formula to satisfy the follow- 
ing congruence closure property: there are no two distinct 
variables u; and uz such that both u = f(uy,..-,Ur,) 
and uy = f(w,,-.-,U,) occur as conjuncts d; in formula 
structure. 

We denote by U the set of internal variables of a given 
semi-base formula, U = {u1,...,un}. 


Definition 20 A semi-base formula in selector language is 
obtained from the base formula in constructor language by 
replacing every conjunct of form 


Ui = f(ui,-.--,U,) 


with the well-defined conjunction 
Isp(ui) A uy = fi(ta) Avs A uy, = fe(ur) 


A semi-base formula in selector language is clearly a well- 
formed conjunction of literals. All atomic formulas in a semi- 
base formula are unnested, in both constructor and selector 
language. 

We can represent a base formula as a labelled directed 
graph with the set of nodes U; we call this graph graph as- 
sociated with a semi-base formula. Nodes of the graph are 
in a bijection with internal variables of the semi-base for- 
mula. We call nodes corresponding to parameter variables 
Up+1;--++;Uptq parameter nodes; nodes ui,...,Up are non- 
parameter nodes. Each non-parameter node is labelled by 
a function symbol f € » and has exactly ar(f) successors, 
with edge from uz to u labelled by the positive integer 7 
iff fi(u,) = uz occurs in the semi-base formula written in 
selector language. A constant node is a node labelled by 
some constant symbol c € &, ar(c) = 0. A constant node 
is a sink in the graph; every sink is either a constant or a 
parameter node. In addition to the labelling by function 
symbols, each node u € U of the graph is labelled by zero 
or more free variables x such that equation x = u occurs in 
the semi-base formula. 


Definition 21 (Base Formula) A semi-base formula ¢ is 
a base formula iff the graph associated with $ is acyclic. 


A semi-base formula whose associated graph is cyclic is un- 
satisfiable in the term algebra of finite terms. Checking the 
cyclicity of a base formula corresponds to occur-check in 
unification algorithms (see e.g. {i1}). 


Definition 22 By height H(u) of a node u in the acyclic 
graph we mean the length of the longest path starting from 
U. 


A node u is sink iff H(u) = 0. 


Definition 23 We say that an internal variable uw is a 
source variable of a base formula 6 iff wi is represented by 
a node that is source in the directed acyclic graph corre- 
sponding to 3. Equivalently, if G is written in the selector 
language, then u; is a source variable iff 3 contains no equa- 
tions of form ui = fi(ur). 


Definition 24 Ifu; and uj are internal variables, we write 
ui >” u; if there is a path in the underlying graph from node 
u; to node u;. Equivalently, u; —* u; iff there exists a term 
t(ui) in the selector language such that — 8 => uj; = t(ui). 


Relation —* is a partial order on internal variables of (3. 
The following Lemma 25]is similar to the Independence 
of Disequations Lemma in e.g. [10] Page 178]. 


Lemma 25 Let (@ be a base formula of the form 


= 
SU1,-++, Up, Upt1,--+,Uptq- Bo 


where Up+1,---;Uptq are parameter variables of 3, and Bo is 
quantifier-free. Let Sp4i,...,Sp+q be infinite sets of terms. 
Then there exists a valuation o such that [Golo = true and 
Juilo € Si forpt+t1l<i<pt+gq. 


Proof. To construct o assign first the values to parameter 
variables, as follows. Let hg be the length of the longest 
path in the graph associated with G. Pick o(up+1) € Sp+1 so 
that h(o(up+41)) > he, and for each i where p+2 <i < p+q 
pick o(ui) € Si so that h(o(ui)) > h(o(wi-1)) + he. The 
set of heights of an infinite set of terms is infinite, so it is 
always possible to choose such o(u:). 

Next consider internal nodes u1,...,Up+q in some topo- 
logical order. For each non-parameter node u; such 
that ui = f(ui,,...,w,) occurs in fo, let o(ui) = 
f(o(wi,),---,o(ui,))- 

Finally assign the values to free variables by a(x) = o(u) 
where x = wu occurs in io. 

By construction, [structure]o = true and [labels]o = 
true. It remains to show [distinct]o = true i.e. o(ui)  o(u;) 
for 1 < i,7 <p+q,%1#j. We show this property of o 
by induction on m = min(H(u:),H(uj)). Without loss of 
generality we assume H(u;) < H(u;). 

Consider first the case m = 0. Then u; is a parameter 
or a constant node. 

If u; is a constant and uj; is a non-parameter variable 
then u,; and u; are labelled by different function symbols so 
a(ui) # o(u;). 

If u; is a constant and u; is a parameter variable then 
h(o(ui)) = 0 whereas h(o(u;)) > he > 0. 

Consider the case where u; is a parameter variable and 
uj is a non-parameter variable. Let 


J = {ji | uj, is a parameter variable s.t. uj; —* uj, } 
If J =0, then Go uniquely specifies o(u,;), and 
h(o(uj)) = H(uj) < ha < h(o(us)) 
Let J £0 and jo = max J. If i < jo, then 
h(a(wi)) < h(o(ujo)) < h(o(us)) 
If jo < 2 then 
h(o(uj)) < h(o(ujo)) + he < h(o(ujo41)) < h(o(ui)) 


Now consider the case m > 0. uz; and uj are non- 
parameter nodes, so let ui = f(ui,,...,Ui,) and uy = 
g(uj,,---,Uyz,). If f Ag then clearly o(ui) 4 o(u;). Other- 
wise, by congruence closure property of base formulas, there 
exists d such that ui, #4 uj,. Then by induction hypothesis 


o(ui,) # o(ty,), 80 o(us) #o(u;). wm 


Corollary 26 Every base formula is satisfiable. 


Proposition 27 (Quantification of Base Formula) /f 
GB is a base formula and x a free variable in 3, then there 
exists a base formula (3; equivalent to Ax.(. 
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Proof. Consider a formula 4x. where ( is a base formula. 
The only place where x occurs in (7 is = us, in the subfor- 
mula labels. By dropping the conjunct x = us, from 3 we 
obtain a base formula 3; where (3; is equivalent to 47.3. = 


Proposition 28 (Quantifier-Free to Base) Every well- 
defined quantifier-free formula in constructor-selector lan- 
guage can be written as true, false, or a disjunction of base 
formulas. 


Proof Sketch. Let ¢ be a well-defined quantifier-free 
formula in constructor-selector language. By Proposition [8] 
we can transform ¢ into an equivalent formula in disjunctive 
normal form 


ViVi V bp 


where each wy; is a well-defined conjunction of literals. Con- 
sider an arbitrary ~;. There exists an unnested quantifier- 
free formula 7); with additional fresh free variables 71,..., 2 
such that w; is equivalent to 


a / 
ADL ys.cry Bge Vy 


By distributivity and (6) it suffices to transform each con- 
junction of unnested formulas into disjunction of base for- 
mulas. In the sequel we will assume transformations based 
on distributivity and are applied whenever we transform 
conjunction of literals into a formula containing disjunction. 
We also assume that every equation f(11,...,2%n) = y is 
replaced by the equivalent one y = f(x1,...,%n) and every 
equation f;(x) = y is replace by y = fi (x). 

Because of our assumption that © is finite, we can elim- 
inate every literal of form —Is;(x) using the equivalence 


alss(c) => \V/ 


g€X\{f} 


Is (x) (17) 


which follows from (16). We then transform formula back 
into disjunctive normal form and propagate the existential 
quantifiers to the conjunctions of literals. We may therefore 
assume that there are no literals of form —ls,y(a) in the con- 
junction. Furthermore, Iss(x) A Isg(a) <=> false for f # g, 
so we may assume that for variable x there is at most one 
literal Is¢(a) for some f. If fi(a) occurs in the conjunction, 
because the conjunction is well-defined, we may always add 
the conjunct Isy(a). This way we ensure that exactly one 
literal of form Is¢(x) occurs in the conjunction. 

We next ensure that every variable has either none or 
all of its components named by variables. If the conjunction 
contains literal Is¢(z) but does not contain « = f(#1,...,2n) 
and does not contain an equation of form y = f;(x) for 
every i, 1 < i < ar(f), we introduce a fresh existentially 
quantified variable for each i such that a term of form y = 
fi(z) does not appear in the conjunction. At this point 
we may transform the entire conjunction into constructor 
language by replacing 


Isp(wi) A vy = fi(ui) Ave A vm = fa (us) 


with ui = f(v1,,..-,v1,) for k = ar(f). 

We next ensure that for every two variables x1 and x2 
occurring in the conjunction exactly one of the conjunct 
“1 = £2 or £1 # £2 is present. Namely if both conjuncts 
a1 = £2 and x21 # £2 are present, the conjunction is false. 
If none of the conjuncts is present, we insert the disjunction 


a1 = %2V 41 # £2 as one of the conjuncts and transform 
the result into disjunction of existentially quantified con- 
junctions. 

We next perform congruence closure for finite terms 
on the resulting conjunction, using the fact that equality is 
reflexive, symmetric, transitive and congruent with respect 
to free operations f € Cons(X) and that t(a) # x for every 
term t £ x. Syntactically, the result of congruence closure 
can be viewed as adding new equations to the conjunction. 
If the congruence closure procedure establishes that the for- 
mula is unsatisfiable, the result is false. Otherwise, all vari- 
ables are grouped into equivalence classes. If a ui = ua 
occurs in the conjunction where both u; and wz are internal 
variables, we replace ui with uz in the formula and elim- 
inate the existential quantifier. If for some free variable x 
there is no internal variable wu such that conjunction x = u 
occurs, we introduce a new existentially quantified variable 
and a conjunct x = u. These transformations ensure that 
for every equivalence class there exists exactly one internal 
variable in the formula. It is now easy to pick representative 
conjuncts from the conjunction to obtain conjunction of the 
syntactic form in Definition [19] of semi-base formula. The 
resulting formula is a base formula because congruence clo- 
sure algorithm ensures that the associated graph is acyclic. 
. 


We next turn to the problem of transforming a base for- 
mula into a quantifier-free formula. We will present two 
constructions. The first construction yields a quantifier-free 
formula in constructor-selector language and is sufficient for 
the purpose of quantifier elimination. The second construc- 
tion yields a quantifier-free formula in selector language and 
is slightly more involved; we present it to provide additional 
insight into the quantifier elimination approach to term al- 
gebras. 

We first introduce notions of covered and determined 
variables of a base formula @. The basic idea behind these 
notions is that @ implies a functional dependence from the 
free variables of 3 to each of the determined variables. 

In both constructions we use the notion of a a covered 
variable, which denotes a component of a term denoted by 
some free variable. In the first construction we also use the 
notion of determined variable, which includes covered vari- 
ables as well as variables constructed from covered variables 
using constructor operations f € Cons(%). 


Definition 29 Consider an arbitrary base formula 3. We 
say that an internal variable u is covered by a free variable 
x iffx =u’ occurs in B for some u’ such that u—* u’. An 
internal variable u is covered iff u is covered by x for some 
free variable x (in particular, if x = u occurs in 3 then u 
is covered). Let covered denote the set of covered internal 
variables of base formula, and let uncovered = U \ covered 
where U is the set of all internal variables of 6. 


Lemma 30 (Covered Base to Selector) Every base 
formula without uncovered variables is equivalent to a 
quantifier free formula in selector language. 


Proof. Consider a base formula 3 where every variable is 
covered. Consider an arbitrary quantified variable u. Be- 
cause u is covered, there exists variable x free in 3 such that 
u = t(x) for some term t in the selector language. Replace 
every occurrence of wu in the matrix of @ by ¢(x) and elim- 
inate the quantification over u. Repeating this process for 
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every variable u we obtain a quantifier-free formula equiva- 
lent to G6. = 


Definition 31 Let @ be a base formula. The set determined 
of determined variables of 3 is the smallest set S that con- 
tains the set covered and satisfies the following condition: 
if u is a non-parameter node and all successors u1,..., Uk 
(k > 0) of wu in the associated graph are in S, then u is also 
in S. 


In particular, every constant node is determined. A param- 
eter node w is determined iff w is covered. 


Lemma 32 [fa node u is not determined, then there exists 
an uncovered parameter node v such that u—* v. 


Proof. The proof is by induction on H(()u). If H(()u) =0 
then u has no successors, and u cannot be a constant node 
because it is not determined. Therefore, wu is a parameter 
node, so we may let v = u. Assume that the statement 
holds for for every node u’ such that H(()u’) = & and let 
H(()u) =k +1. Because u is not determined, there exists 
a successor u’ of u such that u’ is not determined, so by 
induction hypothesis there exists an uncovered parameter 
node v such that u’ >* v. Hence u>* u! >* v. 


Lemma 33 Every base formula 3 is equivalent to a base 
formula 3’ obtained from 3 by eliminating all nodes that are 
not determined. 


Proof. Construct 3’ from @ by eliminating all terms 
containing a variable u € U \ determined and eliminating 
the corresponding existential quantifiers. Then all variables 
in 6’ are determined. (’ has fewer conjuncts than 8, so 
EK 8 > 6B’. To show k pf’ = @, let o be any assignment 
of terms to determined variables of @ such that (@ evaluate 
to true under o. As in the proof of Lemma [25] define the 
extension o’ of o as follows. Choose sufficiently large values 
o’(v) for every uncovered sink variable v, so that o” defined 
as the unique extension of o’ to the remaining undetermined 
variables assigns different terms to different variables. This 
is possible because the term model is infinite. The result- 
ing assignment o” satisfies the matrix of the base formula 
B. Therefore, — 8’ > 6, so 6 and ’ are equivalent base 
formulas. m 


First Construction 


Proposition 34 (Base to Constructor-Selector) 
Every base formula 6 is equivalent to a quantifier-free 
formula @ in constructor-selector language. 


Proof. By Lemma [33] we may assume that all variables 
in @ are determined. ‘To every variable u we assign a term 
T(u). Term 7(w) is in constructor-selector language and the 
variables of t(w) are among the free variables of 3. If u € 
covered, we assign T(u) as in the proof of Lemma If 
U1,.--,;Uk are the successors of a determined node u, we 


put 
T(u) = f(r(u1),..-,7(ur)) 


where f is the label of node wu. This definition uniquely 
determines t(w) for all w € determined. We obtain the 
quantifier-free formula ¢ by replacing every variable u with 
T(u) and eliminating all quantifiers. 


For every u we have - 8 > u = T(u), soF- 6B => ¢. 
Conversely, if ¢ is satisfied then 7 defines an assignment for 
u variables which makes the matrix of 3 true. Therefore 3 
and ¢ are equivalent. = 


Second Construction The reason for using constructor 
symbols f € Cons(X) in the first construction is to pre- 
serve the constraints of form wu 4 v when eliminating node 
u with successors ui,...,Ux. Using constructor symbols we 
would obtain the constraint f(ui,...,uzZ) # v. Our second 
construction avoids introducing constructor operations by 
decomposing f(ui,...,ux) # v into disjunction of inequal- 
ities of form u; # fi(v). When v is a parameter node, the 
presence of term f;(v) potentially requires introducing a new 
node in the associated graph, we call this process parame- 
ter expansion. Parameter expansion may increase the total 
number of nodes in the graph, but it decreases the num- 
ber of uncovered nodes, so the process of converting a base 
formula to a quantifier-free formula in the selector language 
terminates. 


Lemma 35 Let (@ be an arbitrary base formula. 
1. If u is covered and u>>* u' then u’ is covered as well. 


2. If u’ is uncovered and u' is not a source, then there 
exists u # u’ such that u —* u’ and u is also uncovered. 


3. If B contains an uncovered variable then 3 contains an 
uncovered variable that is a source. 


Proof. By definition. m= 


Parameter Expansion We define the operation of ex- 
panding a parameter node in a base formula as follows. Let 
6G be an arbitrary base formula and w a parameter variable 
in @. The result of expansion of w is_a disjunction of base 
formulas @’ generated by applying to w. In each of 
the resulting formulas @’ variable w is not a parameter any 
more. Each 3’ contains Is¢(w) for some f € © and node w 
has successors u1,..., Ux for k = ar(f). Each successor u; is 
either an existing internal variable or a fresh variable. For a 
given (3, sink expansion generates disjunction of formulas (3’ 
for every choice of f € 4 and every choice of successors uj, 
subject to congruence closure so that 3’ is a base formula: 
we discard the choices of successors of w that yield formulas 
8’ violating congruence of equality. (This process is simi- 
lar to converting quantifier-free formulas into disjunction of 
base formulas in the proof of Proposition[28}) The following 
lemma shows the correctness of parameter expansion. 


Lemma 36 (Parameter expansion soundness) Let 
A = 3, V-+- Gi, be the disjunction generated by parameter 
expansion of a base formula 8. Then A is equivalent to G. 


Lemma [36] justifies the use of parameter expansion in the 
following Lemma|37| 


Lemma 37 Every base formula @ can be written as a dis- 
junction of base formulas without uncovered variables. 


Proof Sketch. By Lemma|[33]we may assume that all vari- 
ables of 3 are determined. Suppose (@ contains an uncovered 
variable. Then by Lemma}35] (6 contains an uncovered vari- 
able uo such that wo is a source. Because uo is uncovered 
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and determined, it is not a parameter node. We show how to 
eliminate uo without introducing new uncovered variables. 

Our goal is to eliminate uo from the associated graph. 
We need to preserve information that uo is distinct from 
variables u € U \ {uo} in the graph. We consider two cases. 

If u is not a parameter node, then by congruence closure 
either uo and wu are labelled by different function symbols, 
or they are labelled by the same function symbol f € ¥ 
with ar(f) = & and there exists i, 1 < i < k and variables 
ui = fi(uo) and ui = fi(u’) such that u; 4 uj. Hence the 
constraint uo 4 u is deducible from the inequalities of other 
variables in 3 and we can eliminate uo without changing the 
truth value of (3. 

Next consider the case when wu is a parameter node. By 
assumption wu is determined, and because it is parameter, it 
is covered. We then perform parameter node expansion as 
described above. The result of elimination of uo in @ is a 
disjunction of base formulas 3’, in each 3’ every parameter 
node is expanded. If u is a parameter node in 7 then the 
constraint uo # u is preserved in each ’ because wu is not a 
parameter node in @’ so the previous argument applies. 

Because the parameter nodes being expanded are cov- 
ered, so are their successor nodes introduced by parameter 
expansion. Therefore, by repeatedly applying elimination 
of uncovered variables for every uncovered variable uo, we 
obtain a disjunction A of formulas 3’ where each 3’ has no 
uncovered variables, and A is equivalent to G. = 


Proposition 38 (Base to Selector) For every base for- 
mula 3 there exists an equivalent quantifier-free formula w 
in selector language. 


Proof. By Lemma [37| 6 is equivalent to a disjunction 
31 V-++V Bn where each 3; has no uncovered variables. By 
Lemma each (3; is equivalent to some quantifier free for- 
mula w;, so 2 is equivalent to the quantifier-free formula 
W1V-++V dn. © 


The final theorem in this section summarizes quantifier elim- 
ination for term algebra. 


Theorem 39 (Term Algebra Quantifier Elimination) 
There exist algorithms A, B, C such that for a given formula 
@ in constructor-selector language of term algebras: 


a) A produces a quantifier-free formula ¢' in constructor- 
selector language 


b) B produces a quantifier-free formula ¢' in selector lan- 
guage 


c) C produces a disjunction ¢' of base formulas 


Proof. a): Transform formula ¢ into prenex form 


Qixi sae Qn-1%n-1Qnin.d” 


where ¢* is quantifier free, as in Section [3.1] We eliminate 
the innermost quantifier @, as follows. 

Suppose first that Q,, is 4. Transform the matrix ¢* into 
disjunctive normal form C; V---V C,,. By Proposition 
transform C) V---VC,, into disjunction 31 V---V Bm of base 
formulas. Then propagate J into individual disjuncts, using 


Fan. Bi V-++V Bm <=> (Atn.1) V +++ V (Atn-Bm) 


By Proposition [27] an existentially quantified base formula 
is again a base formula, so Jan.G; <=> (; for some Gj. We 
thus obtain the 


Qixi ee Qn—-10n-1- Bi Vere V Ge, 


By Proposition every base formula is equivalent to a 
quantifier-free formula in selector language, so[18] is equiva- 
lent to 


(18) 


Qi... Qn—1%n-1-W 


where ~w is a quantifier free formula. Hence, we have elimi- 
nated the innermost existential quantifier. 

Next consider the case when Q,, is V. Then ¢ is equiva- 
lent to 


Qit1... Qn—18n—-1 75 tn." 


Apply the procedure for eliminating x, to 4¢*. The result 
is formula of form 

Qixi ..- Qn-10n-1- aw (19) 
where w is quantifier free. But -w is also quantifier free, so 
we have eliminated the innermost universal quantifier. By 
repeating this process we eliminate all quantifiers, yielding 
the desired formula ¢’. 

The direct construction for showing b) is analogous to 
a), but uses Proposition [38] in place of Proposition fd] To 
show c), apply e.g. construction a) to obtain a quantifier- 
free formula w and then transform ~ into disjunction of base 
formulas using Proposition [28] 7 


This completes our description of quantifier elimination 
for term algebras. 

We remark that there are alternative ways to define base 
formula. In particular the requirement on disequality of all 
variables is not necessary. This requirement may lead to 
unnecessary case analysis when converting a quantifier-free 
formula to disjunction of base formulas, but we believe that 
it simplifies the correctness argument. 


4 The Pair Constructor and Two Constants 


In this section we give a quantifier elimination procedure for 
structural subtyping of non-recursive types with two con- 
stant symbols and one covariant binary constructor. Two 
constants corresponds to two primitive types; one binary 
covariant constructor corresponds to the pair constructor 
for building products of types. 

The construction in this section is an introduction to 
the more general construction in Section [5] where we give a 
quantifier elimination procedure for any number of constant 
symbols and relations between them. The construction in 
this section demonstrates the interaction between the term 
and boolean algebra components of the structural subtyping. 
We therefore believe the construction captures the essence 
of the general result of Section [5] 

The basic observation behind the quantifier elimination 
procedure for two constant symbols is that the structure of 
terms in this language is isomorphic to a disjoint union of 
boolean algebras with some additional term structure con- 
necting elements from different boolean algebras. As we ar- 
gue below, the structural subtyping structure contains one 
copy of boolean algebra for every equivalence class of terms 
that have the same “shape” i.e. are same up to the constants 
in the leaves. 
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Consider a signature & = {a,b,g} where a and b are 
constant symbols and g is a function symbol of arity 2. We 
define a partial order < on the set FT()) of ground terms 
over & as the least reflexive partial order relation P satisfying 


1. apb; 
2. (s1 Pti) A (82 Pt2) => g(s1, $2) Pg(ti, te). 


The structure with equality in the language {a,b,g,<}, 
where < is interpreted as above and a, b,g are interpreted 
as free operations on term algebra corresponds to the struc- 
tural subtyping with two base types a and b and one binary 
type constructor g, with g covariant in both arguments. We 
denote this structure by BS. We proceed to show that BS 
admits quantifier elimination and is therefore decidable. 


4.1 Boolean Algebras on Equivalent Terms 


In preparation for the quantifier elimination procedure we 
define certain operations and relations on terms. We also 
establish some fundamental properties of the structure BS. 
Define a new signature No = {c*, 9°} as an abstraction of 
signature ) = {a,b,g}. Define function shapified : © — Yo 
by 
shapified(a) = c* 
shapified(b) = c* 
shapified(g) = g® 


Let ar(shapified(f)) = ar(f) for each f € %; in this case c* is 
a constant and g® is a binary function symbol. Let FT(Xo) 
be the set of ground terms over the signature Xo. Define 
shape of a term t, as the function sh : FT()) > FT(Xo), by 


letting 
sh(f(ti,...,tk)) = 
shapified(f)(sh(ti),...,sh(tx)) 
for k = ar(f). In this case we have 
sh(a) ou 
sh(b) = ¢ 
sh(g(ti,t2)) = g°(sh(t1),sh(t2)) 


Define t; ~ te iff sh(ti) = sh(t2). Then ~ is the smallest 
equivalence relation P such that 


1. apb; 
2. (81 Pt1) A (S2 Pte) => g(s1, 82) P g(t1, te). 
For every term ¢ define the word tCont(t) € {0, 1}* by letting 
tCont(a) = 0 
tCont(b) = 1 
tCont(f(ti,t2)) = tCont(t,) - tCont(t2) 


The set of all words w € {0,1}” is isomorphic the boolean 

algebra of B, of all subsets of some finite sets of cardinality 

n, SO we write wi1Mwe, wiUwe, w® for operations correspond- 

ing to intersection, union, and set complement in the set of 

words w € {0,1}". We write wi C we for wiN we = w1. 
Define function 6 by 


5(t) = (sh(t), tCont(t)) 


For term ¢t in any language containing constant symbols, let 
tLen(t) denote the number of occurrences of constant sym- 
bols in t. If w is a sequence of elements of some set, let 
sLen(w) denote the length of the sequence. Observe that 
sLen(tCont(t)) = tLen(t) and tLen(sh(¢)) = tLen(t). More- 
over, t; ~ tz implies sLen(tCont(t:)) = sLen(tCont(t2)). De- 
fine the set B by 


B= {(s,w) | s € FT(Xo), w € {0, 1}*, tLen(s) = sLen(w)} 


Function 6 is a bijection from the set FT(X) to the set B. 
For b1,b2 € B define b; < be iff 6~*(b1) < 57+(b2). From 
the definitions it follows 


($1, W1) < (s2,w2) = 81 =s2A wi C we 


If g is defined on B via isomorphism 6 we also have 
g((81, w1), ($2, W2)) = (g°(81, $2), Wi + Wa) 
For any fixed s € FT(Xo), the set 
B(so) = {(s, w) 


is isomorphic to the boolean algebra B,,, where n = tLen(s). 
Accordingly, we introduce on each B(s) the set operations 
ti Ms te, ti Us te, t{,. Expressions t; Ms t2 and ti Us te are 
defined iff sh(t;) = s and sh(t2) = s, whereas expression t{, 
is defined iff sh(t1) = s. 

We also introduce cardinality expressions as in Sec- 
ag If ¢ denotes a term, then the expression |t|; de- 
notes the number of elements of the set corresponding to t. 
Here we require s = sh(t). We use expressions |t|, = k and 
|t|; > k as atomic formulas for constant integer k > 0. Note 
that 


(20) 


B\|s= so} 


ti<te => sh(t1) — sh(t2) A ral A t5|shce,) =0 (21) 


i =te => sh(ti) = sh(t2) N 


\(t1 1 t8) U (EM t2)|sh(t1) = 0 
Let sh(ti) = s1, sh(t2) 


|9(t1, ta) 


Equation allows decomposing formulas of form 
|\g(t1, t2)|s = & into propositional combinations of formulas 
of form |t1|s, > & and |ta|s, > k. 


Note further that the following equations hold: 


(22) 
= se, and s = g°(s1, 82). Then 


a \ti|sy + |talso (23) 


g(ti, t2) Ng(th,t) = g(ti Nth, te ty) 
g(ti, t2) Ug(th,t) = g(t: Uti, te U ty) 
g(ti,t2)° = g(ti, ts) 


If E(x1,...,2n) denotes an expression consisting only of op- 
erations of boolean algebra, then from (4.1) by induction 
follows that 


E(g(ti, ti), ---,9(tnitn)) = 9(E(H,--., tn), BG, --. th) 
(24) 
Equations and (23) imply 
|E(9(ti, ti), -.-,9(tastn))| = |E(4,.- 5 tn) ea) 


Boolean algebra B(g°(s1, 82)) is isomorphic to the product 
of boolean algebras B(s;) and B(s2); the constructor g acts 
as union of disjoint sets. 
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a,b term 
g 3: term x term — term 
Isg term — bool 
91,92 term — term 
= : term x term — bool 
< oo: term x term — bool 
Cc o:: shape 
g’ :: shape x shape — shape 
Isgs shape — bool 
91:93 shape — shape 
sh :: term — shape 
=* :: shape x shape — bool 
N_,U_ shape x term x term — term 
_ :: shape x term — term 
1.00 shape — term 
Jd ky |= shape x term — bool 


Figure 5: Operations and relations in structure FT2 


4.2 A Multisorted Logic 


To show the decidability of structure BS, we give a quantifier 
elimination procedure for an extended structure, denoted 
FT2. We use a first-order two-sorted logic with sorts term 
and shape interpreted over FT2. 

The domain of structure FT2 is FT(%) UFT(Xo) with el- 
ements FT(X) having sort term and elements FT(Xo) having 
sort shape. Variables in Var have term sort, variables in Var* 
have shape sort. In general, if t denotes an element of FTa2, 
we write t® to indicate that the element has sort shape. 

F igure[5|shows operations and relations in FT2 with their 
sort declarations. The signature is infinite because opera- 
tions |t|; > k and |t|; = k are parameterized by a non- 
negative integer k. 

We require all terms to be well-sorted. Functions gi and 
gz are interpreted as partial selector functions in the term 
constructor-selector language, so Dg, = Dg, = ((x), Isg(x)). 
Similarly, gj and g3 are partial selector functions in the 
shape constructor-selector language, so Dg = Dg, = 
((x), Isgs(x)). The expressions 1, ¢2 and t1 Us t2 are defined 
iff sh(t1) = sh(t2) = s, and ¢§ is defined iff sh(t) = s. We 
therefore let 


Da, = Du, = 
((y,@1,22), sh(21) = y* A sh(w2) = y*) 
and 
De = ((y’,z),sh(x) = y’) 


For atomic formulas |t|, > k and |t|; = k we require atomic 
formula sh(t) = s to ensure well-definedness: 


Dj_=k = Dij_sr = ((y’, 2), h(a) = y°) 


Note that the language of Figure |5| subsumes the lan- 
guage {a, b, g, <} for the structural subtyping structure. The 


quantifier-elimination procedure we present in Section 
is therefore sufficient for quantifier elimination in the first- 
order logic interpreted over the structural subtyping struc- 
ture FTo. 


4.3 Quantifier Elimination for Two Constants 


We are now ready to present a quantifier elimination pro- 
cedure for the structure FT2. The quantifier elimination 
procedure is based on the quantifier elimination for term al- 
gebras of Section[3.4Jas well as the quantifier elimination for 
boolean algebras of Section B.2] 

We first define an auxiliary notion of a u®-term as a term 
formed starting from shape u* term variables and shape u* 
constants, using operations Nuys, Uus, and —{s. 


Definition 40 (u’-terms) Let u* € Var* be a shape vari- 
able. The set of u°-terms Term(u®*) is the least set such that: 


1. Var C Term(u*) 

2. Ous, lus € Term(u*) 

3. if t,t’ € Term(u®), then also 
tus t! € Term(u’), 
tUus t’ € Term(u’), and 
us © berm(u’) 


Similarly to base formulas of Section[3.4] we define struc- 
tural base formulas for FT, structure. A structural base for- 
mula contains a copy of a base formula for the shape sort 
(shapeBase), a base formula for the term sort without term 
disequalities (termBase), a formula expressing mapping of 
term variables to shape variables (hom), and cardinality con- 
straints on term parameter nodes of the term base formula 
(cardin). 


Definition 41 (Structural Base Formula) 
A structural base formula with: 


e free term variables 41,...,%mj 

e internal non-parameter term variables u1,...,Up; 
e internal parameter term variables Up+41,..-,Up+qs 
e free shape variables x},..., 25,5; 

e internal non-parameter shape variables uj, ..., Ups; 
e internal parameter shape variables ujs,..., Ups4qs 


is a formula of form: 


Fuz,...,Un, Uj, , Ups. 
shapeBase(uj,...,Uhs,21,---,Lims) A 
termBase(ui,...,Un,21,---,;2m) A 
hom(ui,..-;Un;Uj,-+--;Uns) A 
cardin(up41,-.., Un, Upspis +++) Uns) 


wheren = p+q, n° = p+ 4q, and formulas shapeBase, 
termBase, hom, and cardin are defined as follows. 


shapeBase(uj,...,Uhs,21,---;Lins) = 
p ms 
s s $s s s 
i=l i=1 


A distinct(ui,... 
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where each t; is a shape term of form f(uj,,...,Ui,) for 
some f € Xo, k =ar(f), andj: {1,...,m°} — {1,..., n°} ts 
a function mapping indices of free shape variables to indices 
of internal shape variables. 


termBase(ui,...,Un,21,---,2m) = 


p m 
\ wi = ti(ui,...,Un) N \ vi = U3; 
; i=1 


i=1 i= 


where each t; is a term of form f(ui,,..-,Ui,) for some 
feu, k =ar(f), andj: {1,...,m} > {1,...,n} is a 
function mapping indices of free term variables to indices of 
internal term variables. 

1) Un, Ul,-++, Uns) = 


hom(ui,.. A sh(ui) = uj, 
i=1 


where j : {1,...,n} — {1,...,n°} is some function such 


that {a; rae Jp} c {1, ae Dp} and {jptis bes ,Jptat c {p or 
1,...,p°+q°} (a term variable is a parameter variable iff its 
shape is a parameter shape variable). 


. Ss s 
cardin(up41,-..,Uptqy Upspis ++) Upsggs) = Wi A-+ Ar 


where each w; is of form 
[t(p4ay +5 tp+a)lue =k 


or 


t(tp4i,---,Upta)lus 2k 


for some u’-term t(Up4i,---;Up+q) that contains no vari- 


ables other than some of the variables Up+i,...,Up+q, and 
the following condition holds: 
If a_ variable p+; occurs in term 
t(Up+1,--+,Uptq), then sh(upy;) = wv (26) 


occurs in formula hom. 


We require each structural base formula to satisfy the 
ollowing conditions: 
g 


PO) the graph associated with shape base formula 


Fuj,...,Uns. shapeBase(uj,...,Uns,21,---;Ums) 


is acyclic (compare to Definition [2i); 


P1) congruence closure property for shapeBase subformula: 
there are no two distinct variables uj and uj such that 
both uj = f(uj,,...,uj,) anduj = f(uj,,...,uj,) occur 


as conjuncts in formula shapeBase; 


P2) congruence closure property for termBase subformula: 
there are no two distinct variables u; and u; such that 
both ui = f(ui,,.-.,u,) anduj = f(ui,,..., ur, ) occur 


as conjuncts in formula termBase; 


P38) homomorphism property of sh: for every non-parameter 


term variable u such that u = f(ui,,..., Us, ) occurs 


in termBase, if the conjunct sh(u) = u® occurs in 
s s s 
hom, then for some shape variables u%,,...,u;, the 


s Ss * 
fyoe++Uyz,) occurs in shapeBase where 


shapified(f) and for every r where 1 <r < k, 


jr occurs in hom. 


term ue = f*(u 
f= 


conjunct sh(us,.) = u 


According to Definition a structural base formula con- 
tains no selector function symbols. Formulation using se- 
lector symbols is also possible, as in Definition The 
only partial function symbols occurring in a structural base 
formula of Definition are in cardin subformula. Condi- 
tion (26) therefore ensures that functions in cardin and thus 
the entire base formula are well-defined. 

Note that acyclicity of shape base formula shapeBase 
(condition PO) implies acyclicity of term base formula as 
well. Namely, condition P3 ensures that any cycle in 
termBase implies a cycle in shapeBase. 

Asin Section[3.4}we proceed to show that each quantifier- 
free formula can be written as a disjunction of base formulas 
and each base formula can be written as a quantifier-free 
formula. 

We strongly encourage the reader to study the following 
example because it illustrates the idea behind our quantifier- 
elimination decision procedure. 


Example 42 The following sentence is true in structure 
FT2. 


Vayysn cy > 


dz.z<aNhz<yA 
Vu.w<2rAw<y> 


Vu. g(v,z) < g(z,v) A Isg(v) A Isg(w) > gi(w) < gi(v) 

(27) 

An informal proof of sentence is as follows. Suppose 

that « < y. Then sh(x) = sh(y) = 2°. Let z = xMas y. 

Now consider some w such that w < x and w < y. Then 

sh(w) = 2°, so w < z. Suppose that v is such that g(v, z) < 

g(z,v). Then by covariance of g we have z < v, sow < v. If 

we assume Is,(w) and Isg(v), then gi(w) and gi(v) are well 

defined and by covariance of g we conclude gi(w) < gi(v), 
as desired. 

We now give an alternative argument that shows that 
sentence is true. This alternative argument illustrates 
the idea behind our quantifier-elimination decision proce- 
dure. For the sake of brevity we perform some additional 
simplifications along the way that are not part of the pro- 
cedure we present (although they could be incorporated to 
improve efficiency), and we skip consideration of some un- 
interesting cases during the case analyses. 

Let us first eliminate the quantifier from formula 


Wo. g(v, 2) < g(z,v) Alsy(v) A Isy(w) > gu(w) < gi(v) (28) 


Formula (28) is equivalent to s4dv.¢1 where 


b1 = 9(v, 2) < g(z,v) A Isg(v) A Isg(w) A a(gi(w) < me 

29 
We next use to eliminate atomic formulas t; < te and 
replace them with cardinality constraints, resulting in for- 
mula ¢2 equivalent to ¢1: 


b2 = 21 A 2,2 
where 
21 = |g(v,2) Ng(z,v)*|sn(g(w,2)) = OA 
sh(g(v, z)) = sh(g(z,v)) A 
Isg(v) A Isg(w) 


(30) 
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Figure 6: One of the Base Formulas Resulting from 


and 
2,2 = 


7 (191) 9 gi) "lsh(or (wy) = 9 A sh(gi(w)) = ac, 
31 
Here we have written e.g. 


|g(v, 2) Ng(Z,v)"|sh(g(w,z)) = 9 


as a shorthand for 


lg(v, z) sh(g(v,z)) g(Z, &)sh(g(v,2)) lsh(g(v,z)) =0 


(In general, we omit term shape arguments for boolean alge- 
bra operations if the arguments are identical to the enclosing 
term shape argument of the cardinality constraint.) 

We next transform ¢2 into disjunction of well-defined 
conjunctions. Following the ideas in Proposition[§] we trans- 
form ¢2,2 into 43,1 V $3,2 where 


$31 = 
|gi(w) ON gi(v)*|sn(gi (wy) 2 1 A sh(gi(w)) = sh(gi(v)) 
(32) 
and 
$3,2 = sh(gi(w)) 4 sh(gi(v)) 
and then transform 2,1 A ¢2,2 into 
(62,1 A $3,1) V (2,1 A $3,2) 


For the sake of brevity we ignore the case ¢2,1 A ¢3,2; it is 
possible to show that ¢2,1 A 3,2 is equivalent to false in the 
context of the entire formula. 

We transform ¢2,1 \ ¢3,1 into unnested form, introducing 
fresh existentially quantified variables wyz,Uzy,Uwi,Uv1; Uye; 
us), that denote terms occurring in ¢2,1 A ¢3,1. The result 


is formula ¢4 where 


pa = Fuy2, Uev, Uw, U1; Wye, Uw: 
uve = g(,2) A thew = g(z,0) A 
Uwi = gi(w) Aue = gi(v) A 
Une = Sh(uvz) A uy = sh(uw1) A 
sh(tuzv) = Une Ash(uo1) = Uni A 
Isg(v) A Isg(w) A 
Juve NUsylus,, =O A |uwi Nusilus,, 2 1 
(33) 
To transform ¢4 into disjunction of structural base formulas 
we keep introducing new existentially quantified variables 
and adding derived conjuncts to satisfy the invariants of 
Definition 
Because Isg(v) and Is,(w) appear in the conjunct, we 
give names to the remaining successors of v, w, by intro- 
ducing uw2 = go(w), U2 = ge(v). We may now write 
the constraints in constructor language, using e.g. conjunct 
v = g(tv1, U2) instead of 


Isg(v) A ter =gi(v) A tv2 = go(v) 


To ensure that every term variable has an associated shape 
variable, we introduce fresh variables u?,, u%,, UZ, Uip2, Une 
with conjuncts ui, = sh(v), uz, = sh(w), uz = sh(z), wine = 
sh(tw2), Uso = sh(ty2). 

Note that base formula contains distinct(uj,..., u%,) sub- 
formula. In the case when the current conjunction is not 
strong enough to entail the disequality between shape vari- 
ables uj and uj, we perform case analysis, considering the 
case Uj = Uj (then u; can be replaced by uj), and the case 
ui, A Uj. This case analysis will lead toa disjunction of struc- 
tural base formulas (unless some of the formulas is shown 
contradictory in the transformation process). In contrast to 
shape variables, we do not not perform case analysis for dis- 
equality of term variables, because termBase in Definition [41] 
does not contain a distinct subformula. 

In this example we perform case analysis on whether 
us, = uz and u%, = uj, should hold. For the sake of the exam- 
ple let us consider the case when u?, = uz = uy, Usa = Uso 
and uy., Us), Uy1;Uy2 are all distinct. In that case shape 
variables u%,, uz, uz, denote the same shape, so let us replace 
e.g. uZ and us, with u;,. Similarly, we replace ui}. with u%,9. 
We obtain conjuncts sh(v) = ui,, sh(z) = ui,, sh(uy2) = Une. 

We next ensure homomorphism property P3 in Defini- 
pac From conjuncts w»z = g(v,z), sh(uvz) = u3., and 


sh(v) = us,, we conclude 
Uz = Sh(uyz) = 
sh(g(v, z)) = 
gf(sh(v),sh(z)) = 
GF (Uw, Uw) 
so we add the conjunct ui, = g*(ui,, 


tw; Uy) to the formula. 
Similarly, from w = me sh(w) = uy, sh(ww1) = 
Uty1, Sh(tw2) = Uiyg we conclude uz, = g(twi, Uw2) and add 
this conjunct to the formula. Adding these two conjuncts 
makes property P3 hold. (Note that, had we decided to 
consider the case where sh(v) 4 sh(z) we would have arrived 
at a contradiction due to sh(u%,,) = sh(uz,).) 
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We next apply rule to reduce all cardinality con- 
straints into cardinality constraints on parameter nodes 
(nodes u for which there there is no conjunct of form 
u= f(uiz,...,Ui,)). We replace |u2N usylus,, = 0 with 

Jus NUuslus, =OA juz Nuslus, = 0 (34) 
Variable v is a parameter variable, but z is not, which pre- 
vents application of { (25). We aaieloe cairediie Uz1 and 
uz2 such that z = g(uz1, Uz2). Because sh(z) = ui, we have 
sh(uz1) = wi, and sh(uz2) = us, by aie E> Gl prop- 
erty. We can now continue applying rule to ( (34). The 
result is: 


=O0A 
=0 


Jue N Usilus,, =0A Juzt Nn Usilus,, 


|Uo2 1 Usalus,, =O A |uzeN Urals, 
To make the formula conform to Definition [41] we introduce 
internal variables u,,uz,uUw corresponding to free variables 
v,Z,w, respectively. The resulting structural base formula 
is 


WwW 


Uz, Uzv; Uv, Uz, Uw, Uv1, Uv2, Uz1, Uz2, Uw1, U2; 


Uy2) Uws Uwis Uw2- 
(35) 
shapeBase, (A termBase; (A 
hom, A cardinyz 
where 
shapeBase, = wy. = g°(Ury, Uw) A Uy = g° (Ui, U2) A 
distinct(uz,., Ui,, Uiy1; Uw) 
termBasey = Uyz = g(Uv, Uz) A Ucy = g(Uz, Uv) A 
Uy = g(Uv1, Uv2) A Uz = g(Uz1, Uz2) A 
Uw = g(Uw1,Uw2) A 
v= Uy AZ= Uz NW = Uw 
hom, = 


sh(tvz) = Uys Ash(uzv) = uy, A 

sh(uy) = Uw Ash(uz) = uy A sh(uw) = uy A 
sh(tiv1) = Uiy1 A sh(uz1) = uty, A Sh(uwi) = uty, A 
sh(tv2) = Uiyo A sh(uz2) = Usyo A Sh(uw2) = Ui,2 


carding = |tiNwealus,, =O A lua uyilus, = 


Juv2M Uselus,, =O A |uz2N uselus,, =O A 


|Uw1 A usilus,, = 1 


Figure [6] shows a graph representation of the subformulas 
shapeBase,, termBase;, and hom, of the resulting structural 
base formula. 

Recall that we are eliminating the quantification over v 
from 73v.¢1. We can now existentially quantify over v. As 
in Proposition [27] we simply remove the conjunct v = uy 
from termBase and the quantifier dv. 

As in Figure |4] of Section |3.4| [3.4] 4| the structural base for- 
mula form allows us to eliminate an existential quantifier, 
whereas the quantifier-free form allows us to eliminate a 
negation. We transform the structural base formula 
into a quantifier-free formula as follows. 


We first use rule to eliminate variable uy-, replacing 
it with g(v, z). In the resulting formula g(v, z) occurs only 
in hom, in the form 


sh(g(uv, Uz)) = Uyz (36) 


But is a consequence of conjuncts ui, = g°(ui,, Uy), 
sh(u,) = ui, and sh(uw) = us, So we omit from the 
formula. In analogous way we eliminate variable uz, and 
the conjuncts that contain it. We also eliminate u,, anal- 
ogously to Uyz and uzy. In the resulting formula us, oc- 
curs only in distinct subformula of shapeBase. Conjuncts 
U2 Fo Uy, Uys F Uy, and ui, # ui,o follow from the re- 
maining conjuncts in shapeBase by acyclicity. Hence we may 
replace distinct(uy,., Uws Uw) Uw2) by distinct(u;,, Uwis U2) 
Now u,, does not occur in the matrix of the formula, so we 
may eliminate dus, altogether. 
The resulting formula is: 


= 5 s s s 

os = Uz, Uw, Uv1, Uv2, Uz1, Uz2, Uw, U2; Uws Uw Uw2- 
6 = Ashes s a s s s 

Ury = G°(Uin1, U2) A distinct(us,, ui,1, U2) A 

Uz =9 Uz1, Uz2) NN Uw = g(tUw1, Uw2) /N\ 


Z=Uz NW = Uw A 

sh(wz) = uz, A sh(uw) = ui, A 

sh(uo1) = Wor A sh(uz1) = uty, A sh(tw1) = Wir A 
sh(uv2) = Wyo A sh(uz2) = Uryo A Sh(uw2) = Wyo A 


|tot N usalus 


5, =0 A juaNupilu,, =0A 
|uv2M Uselus,, =O A |uz2M ugelus,, =O A 


[Uw a Usilus,, = i 
(37) 
We next eliminate u,1. It suffices to eliminate it from con- 
juncts where it occurs, so we consider formula ¢5,1: 


5,1 — 


sh(tv1) = Uy, A sh(uz1) = usp, A Sh(uw1) = Uy, A 


Ww 


Uy1- 


Juv ON usilus,, =O A luz Nuyilus,, =OA 


Juwi A Uorlus,, 21 

(38) 
Note that all variables from ¢5,1 belong to B(s) where s 
is the value of shape variable ui, (see (2) This means 
that we can apply quantifier elimination for boolean algebra 


(Section [3.2) to eliminate wy 1. The result is 


5,2 = sh(uz1) = us,1 A sh(uwi) = ui, A (39) 
Juw1 VUsilus,, 21 
Similarly, to eliminate w.2 we consider formula ¢5,3: 


$53 = 


sh(tv2) = Ue A sh(uz2) = ure A sh(uw2) = Una A 


WwW 


Uv2- 


|Uv2 al Usalus, =0A |uz2 A Upalus,. =0 
(40) 
The result of boolean algebra quantifier elimination on $5,3 
is true (indeed, one may let wy2 = uz2). The resulting base 
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formula with uy: and uy2 eliminated is ¢¢ : 


= s s Ss 
be = AUz, Uw, Uzi, Uz2, Uwl, Uw2; Uw U1, Uw: 


Ury = O° (Uiy1, U2) A distinct(ut,, Uti, Uw2) A 


Uz =9 Uz1, Uz2) NN Uw = g(tw1, Uwe) A 
Z=uUz\w= Uw A 
(41) 
sh(uz) = uy Ash(uw) = uy A 
sh(uz1) = Uy A sh(uw1) = Ur,y A 
sh(uz2) = U2 A sh(Uw2) = Ue A 


\tw1 M1 UEilus,, 2 1 


Observe that the equalities in dg are sufficient to express all 
variables bound in ¢¢ in terms of free variables (all internal 
variables are “covered” ): 


uz = g1(2) Uz2 = g2(z) 

Uwi = gi(w) Uw2 = g2(w) (42) 
Us = sh(w) 

Uni = Gi(sh(w)) wie = 93(sh(w)) 


Structural base formula ¢¢ is therefore equivalent to the 
quantifier-free formula ¢7,1: 


7,1 = Isgs(sh(w)) A Isg(w) A Isg(z) A 


distinct(gi (sh(w)), g3(sh(w))) 
sh(z) = sh(w) A |gi(w) OM gi(2)“lgs (shy) 2 1 
(43) 
When transforming formula ¢4 we chose the case u%,, 4 U%,1- 
If we choose the case u%,; = ui,2, we obtain quantifier-free 
formula ¢7,2: 


7,2 = Isgs(sh(w)) A Isg(w) A Isg(z) A 


sh(2) =sh(w) A gi(sh(w)) = g3(sh(w)) A 


191 (w) Mgr (2) "198 (sh(wy) 2 1 

(44) 
Our quantifier elimination would also consider the case 
sh(g2(w)) A sh(go(z)). The procedure finds the case con- 
tradictory in a larger context, when eliminating 4z, because 
sh(z) = sh(x) = sh(w) follows from z < x and w < a. Ig- 
noring this case, we observe that ¢7,1 V 7,2 is equivalent to 
the quantifier-free formula ¢g, where 


os = 


Isgs(sh(w)) A Isg(w) A Isg(z) A 
sh(z) =sh(w) A |gi(w)N gi (2) “los (sh(w)) >1 
(45) 
Let us therefore assume that the result of quantifier elimi- 
nation in (28) is ads. 
We proceed to eliminate the next quantifier, Vw, from 
Vwu.w<2rAw <y=> 7s (46) 


(46) is equivalent to 


adw.w<rAw<yh ds 


After eliminating < we obtain 


—dw. |wa*|shw) =O A sh(x) = sh(w) A 


JwNy"|sh(iw) =O A sh(y) =sh(w) A 
Isgs(sh(w)) A Isg(w) A Isg(z) 
sh(z) =sh(w) A |gi(w) OM gi(Z)°l95 (sh(wy) = 1 
(47) 
We now proceed similarly as in eliminating variable v. The 
result is -¢9 where 


go = 


z 


) 
( 


sh(x) = sh(z) A sh(y) = sh(z) A 
a) A Isg(y) A Isg(z) A Isg(sh(z)) A 


A |Sg (48) 
rAgYWAg 


Z)°|9§(sh(z)) 2 1 


The remaining quantifiers that 
nated similarly. 

To eliminate the quantifier 4z, we need to transform 7¢9 
into disjunction of base formulas. This transformation re- 
quires negation of ¢g9 and creates several disjuncts. We con- 
sider only the two cases, $19 and $11, that are not contradic- 
tory in the enclosing context of conjuncts z < x and z < y: 


gio = 


bind z, y, and z are elimi- 


sh(x) = sh(z) A sh(y) =sh(z) A —-lsgs(sh(z)) 


(49) 
gi. = sh(x) =sh(z) A sh(y) =sh(z) A 
Isg (a) A Isg(y) A Isg(z) A Isgs(sh(z)) A (50) 
1gx(@) 1 gi (¥) 1 91(2)" 95 (snc2y) = 0 
dio is equivalent to 
sh(x) = c Ash(y) =c Ash(z) =C (51) 
The result of eliminating dz from 
dz. |zNa*|snz) =O A [zNy"|snz) =9 A dio 
is therefore 
¢10,2 = sh(x) =sh(y) A —-lsgs(sh(x)) (52) 


The result of eliminating dz from 


al 


dz. [2 &°|shcz) =O A lzN y"|sh(z) =0 A OxE 


oi1j2 = sh(x) =sh(y) A Isgs(sh(x)) 
¢10,2 V 11,2 is equivalent to sh(x) = sh(y). Converting 


sh(y) sh(y) 


to structural base formula yields true. We conclude that (27) 
is a true sentence in the structure FT2, which completes our 
quantifier elimination procedure example. 


4 


Formulas in the Example |42|do not contain disequalities be- 
tween terms variables, only disequalities between shape vari- 
ables. If a conjunction contains disequalities between term 
variables, we eliminate the disequalities using rule in 
the process of converting formula to disjunction of struc- 
tural base formulas. The following Example illustrates 
this process. 


sh(x) 


|x NM Y" |sh(a) =O0A sh(z) 
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Example 43 Consider the formula 
¢e = 


Where ¢6 is given by (41). 
equivalent to ~1 V we where 


v1 = sh(uz) # sh(uw) 


be A Uz F Uw 


By (22), literal uz # Uw is 


(53) 
and 

we — sh(uz) = sh(uw) A (54) 
(uz Muy) U (us NA uUw)|sh(uz) 2 1 


In this case, formula ¢6/A71 is contradictory. Formula ¢gA\wv2 
is equivalent to ¢¢ where 


es s 5. Ss 
$6 = WUz, Uw, Uz1, Uz2; Uwl; Uw2, Uw, Vwi, U2: 
Ss _osrs s “oat s s s 
Ury = O° (Uiy1, U2) A distinct(us,, ui,1, Uy) A 
Uz =9 Uz1, Uz2) A Uw = g(tw1, Uwe) A 


Z=uUz\w= Uw A 


sh(uz) = us, A sh(uw) = ui, A (55) 
sh(uz1) = unr A sh(tw1) = Ui A 
sh(uz2) = Ung A sh(Uw2) = Ue A 


Juwi A usilus,, [1A 


|(2z Muy) U (UZ uw)lus, > 1 


As in Example [42] we now apply rule to 
(tz Nuy) U (UZ A Uw)lus, > 1 


and transform ¢¢ into a disjunction of base formulas. 


4 


We proceed to sketch the general case of quantifier elimina- 
tion. The following Proposition [44] is analogous to Proposi- 
tion 27] the proof is again straightforward. 


Proposition 44 (Quantification of Structural Base) 
If B is a structural base formula and x a free term vari- 
able in 3, then there exists a base structural formula (1 
equivalent to dx.3. 


The following Proposition [45|corresponds to Proposition [28] 


Proposition 45 (Quantifier-Free to Structural Base) 
Every well-defined quantifier-free formula ¢ in the language 
of Figure [5] can be written as true, false, or a disjunction of 
structural base formulas. 


Proof Sketch. Let ¢ be a well-defined quantifier-free 
formula in the language of Figure [5] 

We first use rule to eliminate occurrences of < in 
the formula replacing them with cardinality constraints. 

We then convert formula into disjunction $1 V---V dn of 
well-formed conjunctions of literals. We next describe how 
to transform each conjunction ¢; into a disjunction of base 
formulas. 

Let ¢; be a conjunction of literals. Using the technique 
of Proposition [| we convert the formula to unnested form, 
adding existential quantifiers. We then eliminate unnested 


unnested form | cardinality constraint 


L=17%1Ms L2 Jz + (@1N Z2)[, = 0 
X= 21 Us Le jc + (a1 Ure2)|s =0 
A a jc+ai|5= 


Figure 7: Elimination of Boolean Algebra Unnested Formu- 
las . Expression x + y is a shorthand for (aM y°)U(yN 2°). 


conjuncts that contain boolean algebra operations, accord- 
ing to Figure [7 The only atomic formulas in the resulting 
existentially quantified conjunction are of form x = a, x = b, 
c= g(@1, £2), Isg(x), 1 = g(a), 2 = g2(x), v1 = £2, 
x = ce, x = 9 (#1, £3), Isgs(2"), xy = gi(x*), x3 = 93(2*), 
xi = ©, v° = sh(x), as well as |ti|zs > k and |t2|2s = k for 
some x*-terms t; and ta. The only negated atomic formulas 
are of form 21 4 w2, x, # ©, Is,(x) and —Is,gs(x*). As in 
the proof of Preston we use (17) to eliminate ls, (x) 
and —ls,s(x*). This process leaves A of form 21 4 x2 
and x} #4 x5 A the only negated atomic formulas. 

In the sequel, whenever we perform case analysis and 
generate a disjunction of conjunctions, existential quantifiers 
propagate to the conjunctions, so we keep working with a 
existentially quantified conjunction. The existentially quan- 
tified variables will become internal variables of a structural 
base formula. 

We next convert conjuncts that contain only term vari- 
ables to a base formula, and convert shape part to base 
formula, as in the proof of Proposition [28] We simultane- 
ously make sure every term variable has an associated shape 
variable, introducing new shape variables if needed. (This 
process is interleaved with conversion to base formula, to en- 
sure that there is always a conjunct stating that newly intro- 
duced shape variables are distinct.) We also ensure homo- 
morphism requirement by replacing internal variables when 
we entail their equality. Another condition we ensure is that 
parameter term variables map to parameter shape variables, 
and non-parameter term variables to non-parameter shape 
variables; we do this by performing expansion of term and 
shape variables. We perform expansion of shape variables as 
in Section .2] Expansion of term variables is even simpler 
because there is no need to do case analysis on equality of 
term variable with other variables. 

The resulting existentially quantified conjunction might 
contain disequalities u 4 u’ between term variables. We 
eliminate these disequalities as explained in Example 
by converting each disequality into a cardinality constraint 


using (22). In general, we need to consider the case when 
sh(u) # sh(u’) and generate another disjunct. 


Elimination of disequalities might violate previously es- 
tablished homomorphism invariants, so we may need to 
reestablish these invariants by repeating the previously de- 
scribed steps. The overall process terminates because we 
never introduce new inequalities between term variables. 

As a final step, we convert all cardinality constraints into 
constraints on parameter term variables, using (25). In the 
case when the shape of cardinality constraint is c’, we can- 
not apply (25). However, in that case the homomorphism 
condition ensures that each of the participating variables is 
equal to a or equal to b. This means that we can simply 
evaluate the cardinality constraint in the boolean algebra 
{a,b}. If the result is true we simply drop the constraint, 
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otherwise the entire base formula becomes false. 
This completes our sketch of transforming a quantifier- 
free formula into disjunction of structural base formulas. m 


We introduce the notion of covered variables in structural 
base formula by generalizing Definition [29] 


Definition 46 The set covering of variable coverings of a 
structural base formula 3 is the least set S of pairs (u,t) 
where u is an internal (shape or term) variable and t is a 
term over the free variables of 3, such such that: 


1. if x =u occurs in termBase then (u, x) € S; 
2. if x° = u> occurs in shapeBase then (u*,x*) € S; 


8. if (u,t) ES andu= f(ui,... 
for some f € X then {(u, fi(t)),--- 


4. if (wt) € S andw = 
shapeBase then {(uj, fi(t)),... 


5. if (u,t) € S and sh(u) = 
(u®,sh(t)) € S. 


,Uk) occurs in termBase 
» (ux, fe (t))} ie S; 
S(uj,...,U,) occurs in 
(Uk, Fe(O)) } CS; 


u> occurs in hom then 


Definition 47 An internal term variable u is covered iff 
there exists a term t such that (u,t) € S. An internal shape 
variable u® is covered iff there exists a term t° such that 
(u’, ) € S. 


Lemma 48 Let (@ be a structural base formula with matrix 
Bo and let covering be the covering of (3. 
1. If (u,t) € S then — Bp > u=t. 


2. If (u*, t*) e S then |— Bo => us — #. 


Proof. By induction, using Definition [46] : 


Corollary 49 Let @ be a structural base formula such that 
every internal variable is covered. Then (3 is equivalent to a 
well-defined quantifier-free formula. 


Proof. By Lemma [48] using (7). 7 


Lemma 50 Let u be an uncovered non-parameter term 
variable in a structural base formula 3 such that u is a source 
i.e. no conjunct of form 


u! =f (Utsse 


occurs in termBase. Let 3’ be the result of dropping u from 
B. Then B is equivalent to p’. 


Stis, degli) 


Proof. Let u occur in termBase in form 


u= f(ui,... 


The only other occurrence of u in @ is in hom and has the 
form sh(w) = u®. Because non-parameter term variables 
are mapped to non-parameter shape variables, shapeBase 
contains formula 


u> = shapified(f)(ui,... 


Uk) 


Up) (56) 
where uj,...,U;, are such that, by homomorphism property, 
sh(u;) = uj occurs in hom. This means that the conjunct 
sh(u) = u° is a consequence of the remaining conjuncts, so it 
may be omitted. After that, applying yields a structural 
base formula 6’ not containing u, where 3’ is equivalent to 


Gb. 2 


Corollary 51 Every base formula is equivalent to a base 
formula without uncovered non-parameter term variables. 


Proof. If a structural base formula has an uncovered 
non-parameter term variable, then it has an uncovered non- 
parameter term variable that is a source. By repeated ap- 
plication of Lemma (50) we eliminate all uncovered non- 
parameter term variables. = 


The next example illustrates how we deal with cardinal- 
ity constraints |1,|; > & and |1s| = k, which contain no 
term variables. These constraints restrict the size of shape 
s. Luckily, we can be translate them into shape base formula 
constraints. 


Example 52 (Shape Term Size Constraints) 
Let x < y denote conjunction x < yAxz 4 y. Let us eliminate 
quantifiers from formula 4x.¢(a) where 


a(sy.dz.n<yAy<z)a (57) 


a(du. u< x) 


Eliminating variables y,z from the first conjunct and vari- 
able u from the second conjunct yields 


A|2"|sh(a) = 2A A|2|sh(e) 21 
which is equivalent to 
(2° |sh(a) = OV [2 |sn(a2) = 1) A |2|sn(x) = 0 
and further to disjunction 
(l@*|sh(z) =O A |e|shewy = 9) V (@“|sh(ay = 1A |alsnce) = 9) 


The first disjunct can be shown contradictory. Let us trans- 
form the second disjunct into a structural base formula. Af- 
ter introducing wu = x and u® = sh(u), we obtain 


duyw.c=u A sh(u) =u A julus =O A juo|us =1 


Then J2.¢(x) is equivalent to 


qu, u®. sh(u) =u? A julus =O A uolus = 1 


Eliminating parameter term variable u yields 


Ww 


u. |Ljus = 1 


Constraint |1]us = 1 means that the largest set in the 
boolean algebra B(s) where s is the value of u* has size 
one. There exists exactly one boolean algebra of size one 
in the structure FT2, namely {a,b}. Therefore, |1|,s = 1 is 
equivalent to u® = c®. We may now eliminate u* by letting 
u’ = c°. We conclude that the sentence 4x.¢(x) is true. 

Notice that we have also established that formula ¢(z) 
is equivalent to sh(x) = c*®, as a consequence of 


|1sh¢a)|sh(2) = 1 


The following Proposition corresponds to Proposi- 
tion 


Proposition 53 (Struct. Base to Quantifier-Free) 
Every structural base formula 3 is equivalent to a quantifier- 
free formula ¢@ in the language of Figure 
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Proof Sketch. By Corollary we may assume that (3 
has no uncovered non-parameter term variables. By Corol- 
lary [49] we are done if there are no uncovered variables, so 
it suffices to eliminate uncovered parameter term variables 
and uncovered shape variables. 

Let wu be an uncovered parameter term variable. Then u 
does not occur in termBase. Indeed, suppose for the sake of 
contradiction that wu occurs in termBase in some formula 


u! = f(ur,... 


Then u’ is an uncovered non-parameter variable in 3, which 
is a contradiction because we have assumed (@ has no uncov- 
ered non-parameter variables. Therefore, wu does no occur in 
termBase, it occurs only in hom and cardin. Let sh(w) = u® 
occur in hom. Let y1,...,%p be all conjuncts of cardin that 
contain u. Each 7; is of form |tilus > ki or |tilus = ki for 
some u*-term t;. Let uj,,...,uj, be all term variables ap- 
pearing in ¢; terms other than u. Conjunct sh(u;,.) = u® 
occurs in hom for each r where 1 < r < q. The base formula 
can therefore be written in form 


pUy+ ++) UR) 


Gi = Avi,...,%e,21,...,27. PN G1 


All term variables in q~,...,Wx range over terms of shape 
u>’. Therefore, ¢1 defines a relation in the boolean algebra 
B({u’]). This allows us to apply construction in Section|3.2 
We eliminate u from #1 A...A wp and obtain a propositiona 
combination wo of cardinality constraints with u°-terms. ¢o 
does not contain variable u. We may assume that wo is in 
disjunctive normal form 


wo = av... V dw 


Let 
~1,i — 


for 1 <i<w. Base formula (; is equivalent to disjunction 
of base formulas (1,; where 


sh(u;,) =u A...A sh(uj,) =u A ag 


* ay. oA Pi, 


We have thus eliminated an uncovered parameter term vari- 
able u from (1. By repeating this process we eliminate all 
uncovered parameter term variables from a base formula. 
The resulting formula contains no uncovered term variables. 

It remains to eliminate uncovered shape variables. This 
process is similar to term algebra quantifier elimination in 
Section ae essential part of construction in Section 3.4] 
is Lemma which relies on the fact that uncovered pa- 
rameter variables may take on infinitely many values. We 
therefore ensure that uncovered parameter shape variables 
are not constrained by term variables through conjuncts out- 
side shapeBase. 

Suppose that u® is an uncovered parameter shape vari- 
able in a base formula 3. u® does not occur in termBase. 
u> does not occur in hom either, because all term variables 
are covered, and a conjunct sh(u) = u® would imply that u* 


a | S 
Big] Abed Bag My 2 


is covered. The only possible occurrence of u® is in cardi- 
nality constraint ~ of subformula cardin, where w is of form 
|t\us = & or of form |t|us > k. Suppose there is some term 
variable u occurring in t. Then sh(u) = u® so u® is covered, 
which is a contradiction. Therefore, t has no variables. t 
can thus be simplified to either 0,; or lus. In general, a con- 
straint of form |1|us = k or |1|us > k is a domain cardinality 
constraint for boolean algebra B([u*]) (see Remark as 
well as (20). A constraint containing |0us| is equivalent to 
true or false. A constraint |1|,s = 0 is equivalent to false. A 
constraint |1|us = k for k > 1 is equivalent to 


w=tV--- Vw=t, 


where t},..., ¢ is the list of all ground terms in signature Xo 
that have exactly k occurrences of constant c°. We therefore 
generate a disjunction of base formulas (1,..., 3, where (; 
results from 3 by replacing |1|us = k with u® = t}. We con- 
vert each @; to a disjunction of base formulas by labelling 
subterms of ¢; by internal shape variables and doing case 
analysis on the equality between new internal shape vari- 
ables to ensure the invariants of a base formula. The result 
is a disjunction of base formulas where variable u* occurs 
only in shapeBase subformula. 

Similarly, |1|,s > & +1 is equivalent to 7(|1|us = k) and 
thus to 


wAHA +: Aw #ty (59) 
where t{,...,¢ is the list of all ground terms in signature 
Xo that have at most k occurrences of constant c®. We 


replace |1|us > k +1 by and again convert the result 
to a disjunction of base formulas where u* occurs only in 
shapeBase subformula. 

Each of the resulting base formulas 31 are such that every 
uncovered variable in 3" is a shape variable that occurs only 
in shapeBase. Let 


Bi = Au,...,un,ui,.. + Upss Upsy ty. ++ 5 Upstgs 
shapeBase(uj,...,Uns,1,---,Lms) A 
termBase(ui,...,Un,21,---,;2m) A 
hom(ui,...,;Un;Ui,+-+,Un) A 
cardin(Up4p1,.- +, Uptgs Upsp1s ++ +5 Ups+gs) 


where uj,..., Ups are uncovered shape variables. Then ( ig 
equivalent to 37: 


Dire F s s 
B = FUL, +++) Un, Ups4+1)+++ 5 Ups+qs: 
s Ss s 
+> Upstqs,T1,--- ,XLims) 


Im) A 


? (Ups4a, 


termBase(ui,...,Un,@1,.-- 


s $s 
hom(ui,...,;Un;Ui,;---;Un) A 


Hy Ss 
cardin(up4i,-- ,Ups4qs) 


Here ¢? is a base formula (Definitions[19]andfea) whose free 
variables are variables free in 3? as well as all covered shape 
variables: 


2 
p (Ups415 eos 


shapeBase(tuj,..., Ups, Ups$1, Upstgs; L1,-- 


s 
+) Uptq) Ups+i1> cere 


s aa: Ses s 
Ls) = dui,..., Ups. 


s s 
» Ups+qs> U1, ety 
Ss 
: ,Lms) 


Applying Lemma [37| we conclude that ¢? is equivalent to 
some disjunction 
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of base formulas without uncovered variables. Let 6°" be the 
result of replacing ¢? with ¢*" in 6?. Then (3? is equivalent 


to 
k 


VV 8" 


il 


and each 3° has no uncovered variables either, because 
every free variable of ¢°" is either free or covered in 3°". 
By Corollary[49]each 6°" can be written as a quantifier free 
formula. = 


The following Theorem corresponds to of Sec- 
tion 


Theorem 54 (Two Constants Quant. Elimination) 
There exist algorithms A, B such that for a given formula 
o in the language of Figure 5} 


a) A produces a quantifier-free formula ¢' in selector lan- 
guage 


b) B produces a disjunction ¢’ of structural base formulas 


Proof. Analogous to proof of Theorem [39] using Propo- 
sition in place of Proposition and Proposition in 
place of Proposition B83] / 


Corollary 55 The first-order theory of the structure FT2 is 
decidable. 


This completes description of our quantifier elimination 
for the first-order theory of structure FT2, which models 
structural subtyping with two base types and one binary 
constructor. It is straightforward to extend the construc- 
tion of this section to any number of covariant constructors 
if the base formula has only two constants. In Section [5] we 
extend the result to any number of constants as well. Fi- 
nally, in Section [6] we extend the result to allow arbitrary 
decidable structures for primitive types, even if the number 
of primitive types is infinite. 


5 A Finite Number of Constants 


In this section we prove the decidability of structural sub- 
typing of any finite number of constant symbols (primitive 
types) and any number of function symbols (constructors). 
We first show the result when all constructors are covariant, 
we then show the result when some of the constructors are 
contravariant. 

We introduce the notion of %-term-power of some struc- 
ture C as a generalization of the structure of structural sub- 
typing. 

We represent primitive types in structural subtyping as 
a structure C with a finite carrier C. We call C the base 
structure. Without loss of generality, we assume that C has 
only relations; functions and constants are definable using 
relations. Let Dc be aset of relation symbols and let < € Lo 
be a distinguished binary relation symbol. < represents the 
subtype ordering between types. C is finite, so C is decidable 
(see Section |6] for the case when C is infinite but decidable). 

We represent type constructors as free operations in the 
term algebra with signature &. To represent the variance 
of constructors we define for each constructor f € % of 
arity ar(f) = & and each argument 1 < i < k the value 
variance(f,2) € {—1,1}. The constructor f is covariant in 


argument ¢ iff variance(f,7) = 1. For convenience we assume 
ar(f) > 1 for each f € &. 

The »-term-power of C is a structure P defined as fol- 
lows. Let =’ = NUC. The domain of P is the set P of 
finite ground ’-terms. Elements of C’ are viewed as con- 
stants of arity 1. The structure P has signature UULc. The 
constructors f € & are interpreted in P as in a free term 


algebra: 
LF? Cigersste) = 7 Gy eeeate) 


A relation r € Lc\{<} is interpreted pointwise on the terms 


of same “shape” as follows. [r]” is the least relation p such 
that: 


1. if [r]°(a,.. 
2. if p(ti,.. 


e(f(ti,-- 


.;Cn) then p(ci,..., Cn) 
., tin) for all i where 1 <i <k, then 


tik) ys . ., f (tri, sees ,tne)) 


The relation < € Lc is interpreted similarly, but taking into 
account the variance. [<]” is the least relation p such that 


1. if [<]° (ci, c2) then p(ci, c2) 


2. if ; ; 
prea (tar, ee , tin) 
for all i where 1 <i <k, then 
e(f (ta, clon , tir), ane 1S (tna, eres itnk)) 


Here we use the notation p” for v € {—1,1} with the mean- 
ing: p' = p and p-* = {(y, 2) | (a,y) € p}. 

We next sketch the decidability of structural subtyping 
for any finite number of primitive types C. For now we as- 
sume that all constructors f € U are covariant, the relation 
< thus does not play a special role. 


5.1 Extended Term-Power Structure 


For the purpose of quantifier elimination we define the struc- 
ture Pz by extending the domain and the set of operations 
of the term-power structure P. 

The domain of Pg is Pg = PU Ps where Psz is the set 
of shapes defined as follows. Let 5° = {@}U{f* | f € U} 
be a set of function symbols such that c* is a fresh constant 
symbol with ar(c*?) = 0 and f* are fresh distinct constant 
symbols with ar(f*) = ar(f) for each f € X&. The set of 
shapes Pg is the set of ground &*-terms. When referring to 
elements of Pz by term we mean an element of P; by shape 
we mean an element of Ps. We write X* to denote an entity 
pertaining to shapes as opposed to terms, so x*, u® denote 
variables ranging over shapes, and t* to denotes terms that 
evaluate to shapes. 

The extended structure Pg contains term algebra opera- 
tions on terms and shapes (including selector operations and 
tests, Page 61]), the homomorphism sh, and cardinality 
constraint relations |¢|¢: = k and |@|is > k: 


1. constructors in the term algebra of terms, f € »’ 
[Fl ? Gay sept) = Pisceas te 

2. selectors in term the algebra of terms, 

3. constructor tests in the term algebra of terms, 
[Is¢]?2 (¢) = Sti,...,th. t= f(ti,..., tr); 

4. constructors in the term algebra of shapes, f* € b° 


[fl Geo F( Apierabe)s 
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5. selectors in the term algebra of shapes, 
[A1?? (f*(G,--- th) = 

6. constructor tests in the term algebra of shapes, 
[Isp]? (#8) = Sth,..., 44. & = fA, ..., 18); 

7. the homomorphism mapping terms to shapes such 


that: 
h]?= saegte)) = 
[sh]”* (f (ta, ..-5tn)) sas 
shapified(f)([sh] ”” (t1),.--, [sh] (tn)) 
where 
i = ¢, iffeCc (61) 
shapified(f) = f°, iffeExd 
8. cardinality constraint relations 
[Id(x1,.-.,26)ls = A" byt) = gy 
I[o@a,---, te)" (t,--- 5th) =k 
and 
[lo(21, 74 0k) |e 2 kl]? ti, ite) a (63) 
I[p(x1,...,ex)J” (ta,.-.,te)| 2k 


where $(x1,...,2%) is is a first-order formula over the 
base-structure language Lo with free variables 
%1,.-.-,k, term t* denotes a shape, and k is a 
nonnegative integer constant. 


It remains to complete the semantics of cardi- 
nality constraint relations, by defining the set 
[o(a1,...,0%)]°2(t1,...,th). If s is a shape, we call 


the set of positions of constant c® in s leaves of s, and 
denote it by leaves(s). We represent a leaf as a sequence of 
pairs (f,7) where f is a constructor of arity k and 1 <i<k. 
If 1 € leaves(s) and sh(t) = s, then tl] denotes the element 
c € C at position | in term t ie. if 1 = (f',i')...(f", 7") 


then 
tl] = fin(... fa(fa@®)...) (64) 
We define: 
U1,-++-,Uk Pr 1,---,bh) = 
[d(zi,---, 2% )) Pe tr) (65) 
{1 | [o(w1,.-.,2e))° (tall), ---, teld])} 


The following equations follow from (65) and can be used as 
an equivalent alternative definition for cardinality relations: 


[[o(x1,..-, ew )J?" (cr,..-,cx)] = 
- [p(w1,..-, ae))°(er,---, ex) (66) 
0, “Pd (e1,...,2n)I°(er,--- sex) 
[pty <.. 5%)” (fF (ttag s+ tit) 5-225 Stig «2 tar) 
= |[o(ai,..., cx)" (t1,...,ter)| +... 
+ |[o(@1,.--, ee )]?” (tur, .--, ter) 
(67) 
Definition (65) generalizes Definition 2.1, Page 63]. 
We write |A(ti,...,tz)|* = k as a shorthand for the 


atomic formula (|6(a1,...,@n)|s = k)(ti,...,tk), similarly 


for |f(ti,...,tz)|es > k. This is more than a notational con- 
venience, see Section|6]for an approach which introduces sets 
of leaves as elements of the domain of Pz and defines a cylin- 
dric algebra interpreted over sets of leaves. The approach in 
this section follows [35] in merging the quantifier elimination 
for products and quantifier elimination for boolean algebras. 

Some of the operations in Pg are partial. We use the 
definitions and results of Section to deal with partial 
functions. f;(t) is defined iff Isr(t) holds, f7(t*) is defined 
iff Isfs(t*) holds. Cardinality constraints |@(t1,...,tk)|e = 
k and |@(ti,...,tx)|ss > & are defined iff sh(ti) =... = 
sh(t,) = ¢* holds. 

The structure Pz is at least as expressive as P because 
the only operations or relations present in P but not in Pz 
are [r]” for r € Lo, and we can express [r]” (t1,...,t%) as 
|n r(ti, iene » tk) |sh(t1) = 0. 

Our goal is to give a quantifier elimination for first-order 
formulas of structure Pz. By a quantifier-free formula we 
mean a formula without quantifiers outside cardinality con- 
straints, e.g. the formula |Vz.x < t|zs = k is quantifier-free. 


5.2 Structural Base Formulas 


In this section we define the notion of structural base for- 
mulas for any base structure C with a finite carrier. 

Definition of structural base formula for quantifier 
elimination in Pg differs from Definition[4]]in the conjuncts 
of cardin subformula. Instead of cardinality constraints on 
boolean algebra terms, Definition contains cardinality 
constraints on first-order formulas. 

The notion of base formula and Lemma|25]apply to terms 
P as well as shapes Ps in the structure Pz because shapes 
are also terms over the alphabet /°. For brevity we write u* 
for an internal shape or term variable, and similarly x* for 
a free shape or term variable, t* for terms, f* for term or 
shape term algebra constructor and f; for a term or shape 
term algebra selector. 


Definition 56 (Structural Base Formula) 
A structural base formula with: 


e free term variables 41,...,%@mj 


e internal non-parameter term variables u1,...,Up; 
e internal parameter term variables Up+41,...,Up+qs 
e free shape variables x},...,25ns; 

e internal non-parameter shape variables uj, ..., Ups; 
e internal parameter shape variables ups, ..., Ups+qs 


is a formula of the form: 


dui,...,Un,Uj,.--,Ugs- 
shapeBase(uj,...,Uns,21,---,Lims) A 
termBase(ui,...,Un,21,---,;2m) A 
termHom(u1,...,Un,Uq,---,Ums) A 
cardin(up41,-.., Un, Upspi,+-+) Uns) 


wheren=pt+q,n°=p'+¢, and formulas shapeBase, 
termBase, termHom, cardin are defined as follows. 


Ss s s Ss 
shapeBase(uj,...,Uns,21,---,2ms) = 
p ms 
A ui =ti(ui,..-,uns) A A ai = uj, 
i=l | 
A distinct(ui,...,u,) 
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where each ti is a shape term of the form f*(uj,,...,Ui,) 
for some f € Xo, k =ar(f), and 
g:f{i,...,m®} — {1,..., n°} ts a function mapping indices 
of free shape variables to indices of internal shape variables. 
termBase(ui,...,Un,;@1,---,;2m) = 

p m 

\ Ui = ti(u1,...,Un) A \ Li = Uj; 

i=1 i=1 


where each t; is a term of the form f(ui,,...,Ui,) for 
some f € X, k=ar(f), andj: {1,...,m} — {1,...,n} is 
a function mapping indices of free term variables to indices 
of internal term variables. 


n 


termHom(u1,...,Un,Ui,---,Uns) = A\ sh(us) = uj, 
i=1 
where j: {1,...,n}— {1,...,n°} ts some function such 
that {j1,.-..,jp} © {1,...,p°} and 


{jp4i,--+sdptat C {pe +1,...,p° + G} (a term variable is 
a parameter variable iff its shape is a parameter shape 
variable). 


Uns) = U1 A+: Aba 


where each w; is a cardinality constraint of the form 


di : 
cardin(up4i,-..,Un, Upsqis ++: 


|P(Uj 15 +++ Uy) fas =k 
or 
|P(ujy, +++, Uj) us zk 
where {j1,...,ji} C {p+ 1,...,n} and the conjunct 


sh(uj,) = ue occurs in termHom for1<d<l. We require 
each structural base formula to satisfy the following 
conditions: 


PO) the graph associated with shape base formula 


Fuy,...,Uns. shapeBase(uy,..., Uns, 21,---;Lms) 
is acyclic; 

P1) congruence closure property for shapeBase subformula: 
there are no two distinct variables u; and us such that 
both ui = f(ui,,...,ui,) and uj = f(ul,,.--, ui, ) 
occur as conjuncts in formula shapeBase; 

P2) congruence closure property for termBase subformula: 
there are no two distinct variables u; and u; such that 
both ui = f(ui,,---,W,) and uj = f(ui,,.--, Ua, ) 
occur as conjuncts in formula termBase; 

P83) homomorphism property of sh: for every 


non-parameter term variable u such that 
u = f(ui,,-.-,Usi,) occurs in termBase, if conjunct 
sh(u) = u° occurs in termHom, then for some shape 
- s s S_ £s/,s s 
variables Ujys +++) Uz, term = f (uj, 5 was Ujg) 
occurs in shapeBase where f* = shapified(f) and for 
. _.s 
every r where 1 <r <k, conjunct sh(ti,,) = Uj, 
occurs in termHom. 


Note that the validity of the occur check for term variables 
follows from PO) and P3). Another immediate consequence 
of Definition [56] is the following Proposition [57] 


Proposition 57 (Quantification of Str. Base Form.) 
If B is a structural base formula and x a free shape or term 
variable in 3, then there exists a base structural formula 3, 
equivalent to dx.3. 


We proceed to show that a quantifier-free formula can be 
written as a disjunction of base formulas, and a base formula 
can be written as a quantifier-free formula. 


5.3. Conversion to Base Formulas 


Conversion from a quantifier-free formula to the structural 
base formula is given by Proposition [57] The proof of Propo- 


sition|58]is analogous to the proof of Proposition[45]but uses 
of (67) instead of (25). 


Proposition 58 (Quantifier-Free to Structural Base) 
Every well-defined quantifier-free formula @ is equivalent 
on Pr to true, false, or some disjunction of structural base 
formulas. 


5.4 Conversion to Quantifier-Free Formulas 


The conversion from structural base formulas to quantifier- 
free formulas is similar to the case of two constant symbols 
in Section[4.3] but requires the use of Feferman- Vaught tech- 
nique. 


Definition 59 The set determinations of variable determi- 
nations of a structural base formula 3 is the least set S of 
pairs (u*,t*) where u* is an internal term or shape variable 
and t™ is a term over the free variables of GB, such such that: 


1. if x* = u* occurs in termBase or shapeBase, then 
(u*,a*) € S; 


2. if (u*,t*) € S and u* = f*(uj,... 
shapeBase or termBase then 


{(ur, fr (t")), +++ (Wks Set) SS; 


, Uz) occurs in 


3. if {(ut, fi (t*)),.-., (uk, FEE") } GS and 
u* = f*(uj,...,ug) occurs in shapeBase or termBase 
then (u*,t") € S; 

4. if (u,t) € S and sh(u) = u® occurs in termHom then 
(u’, sh(t)) € S. 


Definition 60 An internal variable u* is determined if 
(u*,t*) € determinations for some term t®. An internal vari- 
able is undetermined if it is not determined. 


Lemma 61 Let 6 be a structural base formula with ma- 
triz Bo and let determinations be the determinations of G. If 
(u*,t*) € S then — Bo > u* =t*. 


Corollary 62 Let @ be a structural base formula such that 
every internal variable is determined. Then (@ is equivalent 
to a well-defined quantifier-free formula. 


Proof. By Lemma|6]] using 


WwW 


c.c=tAdo(r) <> (bt) (68) 


Lemma 63 Let u be an undetermined non-parameter term 
variable in a structural base formula 3 such that u is a source 
i.e. no conjunct of the form 


ul = f(ui,... 


occurs in termBase. Let 3’ be the result of removing u and 
conjuncts containing u from 3. Then 3 is equivalent to ('. 


,U,.++, Uk) 


Proof. The conjunct containing u in termHom is a conse- 
quence of the remaining conjuncts, so we drop it. We then 


apply (68). : 
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Corollary 64 Every base formula is equivalent to a base 
formula without undetermined non-parameter term vari- 
ables. 


Proof. If a structural base formula has an undeter- 
mined non-parameter term variable, then it has an unde- 
termined non-parameter term variable that is a source. Re- 
peatedly apply Lemma to eliminate all undetermined 
non-parameter term variables. = 


The following Lemmaf65]is a consequence of the fact that 
terms of a fixed shape s form a substructure of P isomorphic 
to the finite power C™ where m = |leaves(s)| and follows 
from Feferman-Vaught theorem in Section 3.3] 


Lemma 65 Let 


a = du. sh(u)=ue A 
sh(ui) =ue A...A sh(up) = ue A 


bi AN... dp 


(69) 


where each y; is a cardinality constraint of the form |dlus = 
k or |blus > k& where all free variables of 6 are among 
U,U1,...,Un- Then there exists formula w such that w 
is a disjunction of conjunctions of cardinality constraints 
|\o'| =k and |¢'| > k where the free variables in each ¢’ are 
among u1,...,Uk and formula a is equivalent on Pr to a’ 
where 

a’ = sh(u1)=u> A...A sh(up) = ue A wv (70) 
Proposition 66 (Struct. Base to Quantifier-Free) 
Every structural base formula 3 is equivalent on Pr to 
some well-defined quantifier-free formula @. 


Proof Sketch. By Corollary|64|we may assume that (3 has 
no undetermined non-parameter term variables. By Corol- 
lary [62] we are done if there are no undetermined variables, 
so it suffices to eliminate undetermined parameter term vari- 
ables and undetermined shape variables. 

Let u be an undetermined parameter term variable. wu 
does not occur in termBase because it cannot have a succes- 
sor or a predecessors in the graph associated with term base 
formula. Therefore, u’ occurs only in termHom and cardin. 
Let u® be the shape variable such that u®’ = sh(u) occurs 
in termHom. Let #1,...,q%p be all conjuncts of cardin that 
contain wu. 

Each 7; is of the form |¢|us > ki or |dlus = ky and for 
each variable u’ free in ¢ the conjunct sh(u) = u® occurs 
in termHom. The base formula can therefore be written in 
form 


=) =I s s 
Bi SS AC ges. f Leg Lig 15 Lp ONO 


where a has the form as in Lemma|65] Applying Lemma|65] 
we eliminate u and obtain y = Vie a; where and each aq; is 
a conjunction of cardinality constraints. Base formula 3; is 
thus equivalent to the disjunction oan (1,4 where each (;,; 
is a base formula 


Bie = da1,.. gerry ess CF. oN O14 

By repeating this process we eliminate all undetermined pa- 
rameter term variables from a base formula. Each of the 
resulting base formulas contains no undetermined term vari- 
ables. 


It remains to eliminate undetermined shape variables. 
This process is similar to term algebra quantifier elimi- 
nation; the key ingredient is Lemma which relies on 
the fact that undetermined parameter variables may take 
on infinitely many values. We therefore ensure that un- 
determined parameter shape variables are not constrained 
by term and parameter variables through conjuncts outside 
shapeBase. 

Consider an undetermined parameter shape variable u°. 
u®> does not occur in termHom, because all term variables 
are determined and a conjunct u*® = sh(u) would imply that 
u* is determined as well. u® can thus occur only in cardin 
within some cardinality constraint |¢|us = k or |¢lus > k. 
Moreover, formula ¢ in each such cardinality constraint is 
closed: otherwise ¢ would contain some free variable u, by 
definition of base formula u would have to be a parame- 
ter variable, all parameter term variables are determined, 
so u® would be determined as well. Let u*® denote some 
shape s. Because ¢ is a closed formula, |¢| is equal to 
0 if [¢]° = false and to the shape size m = |leaves(s)| if 
[¢]° = true. (The fact that closed formulas reduce to the 
constraints on domain size appears in Theorem 3.36, 
Page 13].) After eliminating constraints equivalent to 0 = k 
and 0 > k, we obtain a conjunction of simple linear con- 
straints of the form m = k and m > k. These constraints 
specify a finite or infinite set S C {0,1,...} of possible sizes 
m. Let A = {s | |leaves(s)| € S}. If the set S is infinite 
then it contains an infinite interval of form {mo, mo +1,...} 
so the set A is infinite. If © contains a unary construc- 
tor and S is nonempty, then A is infinite. If © contains 
no unary constructors and S is finite then A is finite and 
the cardinality constraints containing u* are equivalent to 
Vi_, uw = t; where A = {t},...,¢,}. We therefore gener- 
ate a disjunction of base formulas (1,...,8) where (; re- 
sults from (@ by replacing cardinality constraints containing 
us with with u® = t?. We convert each 3; to a disjunction 
of base formulas by labelling subterms of t; with internal 
shape variables and doing case analysis on the equality be- 
tween new internal shape variables to ensure the invariants 
of a base formula, as in the proof of [58] By repeating this 
process for all shape variables u* where the set S' is finite, 
we obtain base formulas where the set A is infinite for every 
undetermined parameter shape variable u*. We may then 
eliminate all undetermined parameter and non-parameter 
shape variables along with the conjuncts that contain them. 
The result is an equivalent formula by Lemma [25] 

All variables in each of the resulting base formulas are 
determined. By Corollary [62] each formula can be written 
as a quantifier-free formula, and the resulting disjunction is 
a quantifier-free formula. m= 


5.5 One-Relation-Symbol Variance 


So far we have assumed that all constructors are covariant. 
In this section we describe the changes needed to extend 
the result to the case when the constructors have arbitrary 
variance with respect to some distinguished binary relation 
denoted <. 


Definition 67 If ¢ is a first-order formula in the language 
Lc the contravariant version of ¢, denoted gD, is defined 
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by induction on the structure of formula by: 


(r(ti,...,th))-) = r(ti,...,tk), ifr e Lo \ {<} 
<i) = t<h 
GAGE” — ge Mpg? 
(d1 V do (=1)) oY A gg-Y) 
GQ =. ag@9 
(at.¢)-P = Std 
(vt.g)-) = veo 


(71) 


Define C~' to have the same domain and same interpretation 
of operations and relations r € Lo \ {<} but where 


[sI°" = (si*)* 
We clearly have for every formula ¢ and every valuation o: 
[SI = fal 


If 1 € leaves(s) is a leaf 1 = (ft,i')... 
variance(l) as the product of integers 


(72) 


(73) 
(f",2"), define 


II variance(f’, i”) (74) 
j=l 
We generalize to 
[¢(21, ie ch tp)? (ti, aeai tic) = 
(75) 


{L | [b(xi,...,0n)]° (ta[U],.--, te[f])} 


where C’ denotes C for variancel = 1 and C~! for variancel — 
1. Hence, isomorhism between terms of some fixed shape 
s with |leaves(s)| = m and C™ breaks, but there is still an 


isomorphism with CP“) x (¢~')N“) where 
P(s) = 
N(s) = 


Because of this isomorphism, Lemma [65] still holds and we 
may still use Feferman-Vaught theorem from Section [3.3] 
Equation generalizes to: 


[[o(aa,---,¢)]?2 (f(t, -., tu)... f (tar, -- 
= hy [oon er, te)? (tras. 


{I € leaves(s) | variance(/) = 1}| 


{l € leaves(s) | variance(/) = —1}| 


-txi))| 


., tea)| 

(77) 
The only change in the proof of Proposition is the use 
of (77) instead of G4 Most of the proof of Proposition [66] 
remains unchanged as well; the only additional difficulty is 
eliminating constraints of the form |¢|,s = k and |d|ys > k 
where u® is a parameter shape variable and ¢ is a closed 
formula. Lemma [68] below addresses this problem. 

We say that an algorithm g finitely computes some func- 
tion f : A— 2” where B is an infinite set iff g is a function 
from A to the set Fin(B) U {co} where Fin(B) is the set of 
finite subsets of set B, oo is a fresh symbol, and 


f(a), if f(a) € Fin(B) 
g(a) = 


; ; (78) 
oo, if f(a) ¢ Fin(B) 


Lemma 68 There exists an algorithm that, given a shape 
variable u° and a conjunction y = A\v_, vi of cardinality 
constraints where each w; is of form ible = =k; or |dilus > ki 
for some closed formula $j, finitely computes the set 


A={s|[¥]? [uo s}} 
of shapes which satisfy w in P. 


(79) 


Proof Sketch. Let ¢ be a closed formula in language Lo. 
Compute [@]° and [¢'~?]° and then replace |¢|, with one 
of the expressions P(s) + N(s), P(s), N(s), 0 according to 
the following table. 


[91° [oP | Idle = 

true true P(s) + N(s) 

true false P(s) (80) 
false true N(s) 

false false 0 


The constraints of the form N(s) + P(s) = k and N(s) + 
P(s) = k can be expressed as propositional combinations of 
constraints of the form N(s) = k, P(s) = k, P(s) > k and 
N(s) > k. Therefore, ~ can be written as a propositional 
combination of these four kinds of constraints and each con- 
junction C(s) can further be assumed to have one of the 
forms: 


F1) Crp kn (8) = 
F2) Crp nt (8) = 


P(s)=kp A N(s) = kn 
P(s)=kp A N(s) > kn 


= P(s) > kp \ N(s) = kw; 


P(s)>kp A N(s) > ky. 


Let A = {s € Pg | C(s)}. To compute A when © contains 
unary constructors, we first restrict © to the language ©’ 
with no unary constructors, and compute the set A’ C A 
using language &’. If A’ is empty, so is A, otherwise A is 
infinite. Assume that © contains no unary constructors. As- 
sume further © contains at least one binary constructor and 
at lest one constructor is contravariant in some argument. 


Let 
S = {(P(s),N(s)) | 8 € A} 


Because P(s)+N(s) = |leaves(s)| and there are only finitely 
many shapes of any given size (every constructor is of arity 
at least two), it suffices to finitely compute S. S can be 
given an alternative characterization as follows. If f € %, 
ar(f) = k, f is covariant in / arguments and contravariant 
in k—l arguments define 


Lf] ((p1,72), age = 


: (81) 
os Pit dU; 141 ie oa ni + eS | 1 Pi) 


Let U be the subset of {(p,n) | p,n > 0} generated from 
element (1,0) using operations [f]° for f € ©. Then 


S = {(p,n) €U | e(p, n)} 


where c(p,n) is the linear constraint corresponding to the 
constraint C(s). 


(82) 
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Let C(s) = Crp,ry(s). Then S C {(p,n) | ptn = 
kp+kwn}. S is therefore a subset of a finite set and is easily 
computable, which solves case F1). 

Let C(s) = Che, nt (8 (s). Because © contains a binary con- 


structor, S' eontains pairs (p,n) with arbitrarily large p+n, 
so either the p components or n component of elements of 
S grows unboundedly. Because © contains a constructor f 
contravariant in some argument, we can define using f an 
operation o acting as a constructor covariant in at least one 
argument and contravariant in at least one argument. Using 
operation on tuples whose one component grows unbound- 
edly yields tuples whose both components grow unbound- 
edly. Therefore, S is infinite, which solves case F4). 
Finally, consider the case C(s) = ee (s) (this will 


solve the case C(s) = C (s) as well). Observe that 


kBLKN 
kp-1 
Cop ae (8)=Crpor(s)A A ~Cep(s) (83) 
i=0 


Because the set S for each Cy,,i(s) is finite, it suffices to 
finitely compute S for C;,,, 9+(s). In that case 


S = {(p,n) €U | p=kp} (84) 
Let e = - _ 

i = {ip,n)e hon (85) 

T, = {(p,n)eU|n=i} 


To finitely compute S, finitely compute the sets S; and T; 
for0 <i< kp. The algorithm starts with all sets 5; and T; 
empty and keeps adding elements according to operations 
Ll’. 
Assume that So, 7o,..., S:—1, T;-1 are finitely computed. 
The computation of S; and T; proceeds as follows. Let f € & 
be a constructor of arity k with | covariant arguments. For 
S;, we consider all solutions of the equation 

+np=i (86) 


Pit--+ + pit nigit... 


for nonnegative integers pi1,...,D1,Mi41,---,Nx. First con- 
sider solution solutions where no variable is equal to 7. If for 
one of the solutions, one of the sets Sp,,..., Sp, is infinite, 
then S; is infinite, otherwise add to S; all elements (2, n) 
where 


+ pi41 +... + Dk (87) 


If n < kp then also add the same elements (i,n) to Th. 
Next, proceed analogously with T;, considering solutions of 


(88) 


n=m+t-r--+n 


mtr tm t+ pit... + De =t 
If at this point S; is not infinite and not empty, then also 
consider the solutions of where p; = i for some j. If 
such solution exists, then mark 5; as infinite. Proceed anal- 
ogously with 7;. Finally, if both S; and T; are still finite 
but there exists a solution for S; where nj; = 7 for some j 
and exists a solution for T; where pia = 7% for some d, then 
mark both S; and 7; as infinite. This completes the sketch 
of one step of the computation. (This step also applies to 
So and To; we initially assume that (1,0) € To.) = 


Example 69 Let us apply this algorithm to the special case 
where © = {f,g} and 


variance(g) = 


(1,1) 


variance(f) = (—1, 1) 


Let us see what the set S looks like. If (x,y) € S define 
k(x, y) = (kx, ky) as in a vector space. 

First, (1,0) € S because of c°. Next (1,1) € S because 
of f* and (2,0) € S because of 9°. 

More generally, we have the following composition rule: 
If (pi, m1), (pa, n2) then 


(p1 + p2,ni tne) ES 
because of g°, and 
(n1 + p2,pi+ne) €S 


because of f*. 

Using g* we obtain all pairs (p,0) for p > 1. Using f* 
once on those we obtain (1,n) for n > 0. Adding these we 
additionally obtain (p,n) for p > 2 and n > 0. Hence we 
have all pairs (p,n) for p > 1 and n > 0 and those are the 
only ones that can be obtained. Thus, 


S={(p,n) |p>1An> 0} 


As expected, the case F1) yields a finite and the case F4) 
an infinite set. The case F2) for kp = 0 is an empty set, 
otherwise it is an infinite set. The case F3) always yields 
an infinite set. This solves the problem for two constructors 


fg: 
4 


Lemma|68jallows to carry our the proof of Proposition [66] 
so we obtain our main result for finite C. 


Theorem 70 (Term Power Quant. Elimination) 
There exists an algorithm that for a given well-defined 
formula ¢ produces a quantifier-free formula ¢' that is 
equivalent to 6 on Pr. 


Corollary 71 (Decidability of Structural Subtyping) 
Let C be a structure with a finite carrier and P a X-term- 
power of C. Then the first-order theory of P is decidable. 


6 Term-Powers of Decidable Theories 


In this section we extend the result of Section[lon decidabil- 
ity of term-powers of a base structure C to allow C to be an 
arbitrary decidable theory, even if the carrier C’ is infinite. 

To keep a finite language in the case when C is infinite, 
we introduce a predicate Ispri that allows testing whether 
té€C for atermt e€ P. 

In structural base formulas, we now distinguish between 
1) composed variables, denoting elements t € P for which 
Isp(t) holds for some constructor f € 4, and 2) primitive 
variables, denoting elements t € P for which Isppi(t) holds. 

Another generalization compared to Section [5] is the use 
of a syntactically richer language for term power algebras; to 
some extent this richer language can be viewed as syntactic 
sugar and can be simplified away. 

The generalization to infinitely many primitive types and 
the generalization to a richer language are orthogonal. 

For most of the section we focus on covariant construc- 
tors, Section [6.5] discusses a generalized notion of variance. 

As in Section[3.3]let C = (C, R) be a decidable structure 
where C is a non-empty set and R is a set of relations inter- 
preting some relational language Lo, such that each r € R 


31 


lifted relations r’ for r € Lo 


r’ os: term” — bool 


term algebra on terms 
constructors, f € U: 
fou term® — term 
constructor test, f € U: 


Is¢ term — bool 


selectors, f € XU: 


fi ou: term — term 


Figure 8: Basic Operations of -term-power Structure 


is a relation of arity ar(r) on set C, ie. r C C™™. We 
assume that R contains a binary relation symbol r~ € R, 
interpreted as equality on the set C. 

Operations and relations of the “-term-power structure 
are summarized in F igure[8} We will show the decidability of 
the first-order theory of the structure with these operations. 

In the special case when C = {a,b} and 


r = {(a,a), (a,b), (b, b) } 


we obtain the theory in Section [4] When R = {r} where r is 
a partial order on types, we obtain the theory of structural 
subtyping of non-recursive covariant types. For arbitrary 
relational structure C, if f € & for ar(f) = k we obtain a 
structure that properly contains the k-th strong power of 
structure C, in the terminology of [35]. 

The structure of this section follows Sections[4] We also 
associate a boolean algebra of sets with each term t. How- 
ever, in this case, the elements of the associated boolean 
algebra are sets of occurrences of the constants that sat- 
isfy the given first-order formula interpreted over C. The 
occurrences of constants within the terms of a given shape 
correspond to the indices of the product structure in Sec- 
tion 3.3] We call these occurrences leaves, because they can 
be represented as leaves of the tree corresponding to a term. 


6.1 Product Theory of Terms of a Given Shape 


In this section we define the notions shape and leafset, and 
state some properties that we use in the sequel. 
Let 
Lo = {e}ULF | fez} 
be a set of function symbols such that c is a fresh constant 
symbol with ar(c*) = 0 and f* are fresh distinct constant 


symbols with ar(f*) = ar(f) for each f € Y. Let shapified : 
Xu’ — Xo be defined by 


ifxeCc 
if fed 


shapified(z) = c, 

shapified(f) = f°, 
Let FT(Xo) be the set of ground terms with signature Uo 
and FT(’) the set of ground terms of signature ’. 


Define function sh :: FT(S’) — FT(So) mapping each 
term to its shape by 


sh(f(t1,...,tn)) = shapified(f)(sh(t1),...,sh(tn)) 


for each f € X’. Define t; ~ te iff sh(t1) = sh(t2). 

Let ¢ be_a term or shape and t’ the tree representing t as 
in Section [2.2| If p is a path such that t’/(p) is defined and 
denotes a constant, we write t[p] to denote t'(p) and call pa 
leaf. Note that t[p] is defined iff sh(t)[p] is defined. On the 
set of equivalent terms leaves act as indices of Section 3.3] If 
s is a shape, let leaves(s) denote the set of all leaves defined 
on shape s. 

Generalizing tCont of Section|4.1] define function tCont : 
FT(X’) — C® by: 


tCont(c) = 
tCont(f(ti,...,tkh)) = 
Define 6(t) = (sh(€), tCont(t)) and 


c,ifcEeCc 
tCont(t1) -...-tCont(tx) 


B= {(s,w) | s € FT(Xo), w € C%, tLen(s) = sLen(w)} 


If all constructors f € 4 are covariant then 6 is a bijection 
between FT(»’) and B. Let 


B(so) = {(s, w) 


For a fixed s9, the set B(so) is isomorphic to the power 
structure C” where n = tLen(s). 

For each shape s we introduce operations from Sec- 
tion To distinguish the sets of positions belonging to 
different shapes, we tag each set of positions L with a shape 
s. We call the pair (s,L) a leafset. The interpretation of 
each relation r € Lc is the leafset: 


[rs] (t1,.--,te) = (8, {7 | Er] ° (ta P],---, telpl)}) 


We let A), V4, 4, truel,, falsel, stand for intersection, union, 
complement, full set and empty set in the algebra of subsets 
of the set leaves(s). We also introduce 5, as the union of a 
family of subsets indexed by a term of shape s and V\, as the 
intersection of a family of subsets indexed by a term. 

We use constructor-selector language for the term alge- 
bra on terms. We introduce constructor-selector language 
on shapes by generalizing operations in Section [4-T]in a nat- 
ural way. In addition, we introduce a constructor-selector 
language on leafsets. For each f € we introduce a con- 
structor symbol f' on leafsets and define 


B\|s= so} 


leafified(f) = ft 


Constructors f ' act on leafsets as follows. If L; C leaves(s;) 
for 1 <i<k define 


fai Ta) sc8. 5 (se, be) = (aD) 
where s = f*(s1,.. 
L = ({1}- La) U---U ({e} Le) 


(Here we define A- B= {a-b| aE AAbE B}.) 

We define selector functions on leafsets as follows. If s = 
f'(si,...,8%) and L C leaves(s), then ff((s, L)) = (si, Li) 
where L; C leaves(s;) is defined by 


., Sr), and L C leaves(s) is given by 


DL, ={w|w-ie A} 
Equivalently, we require that 


fi(f' ((s1, L1), re) (8n,Ln))) = (si, Li) 
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We can now express relations r’ in Figure [8] using the fact: 


r' (ti, eae , tr) _——4 
sh(t2) =sh(ti) A...A sh(tx) = sh(ti) A (89) 
Tsh(t,)(t1,.--,tk) = truesn (ts) 


To handle an infinite number of elements of the base 
structure C, we do not introduce into the language constants 
for every element of C’ as in Section] Instead, we introduce 
the predicate Ispr; :: term — bool called primitive-term test 
that checks whether a term is a constant: 


Ispri(a) = (x € C) 
and the predicate Ispp 
leafset test: 

sprit ((8, L)) = (8 =e) 
Instead of the rule (16), we have for f,g € UU {PRI}: 


Va. Vs Is¢ (a) 


:: leafset — bool called primitive- 


fEXU{PRI} (90) 
Va. a(Isp¢(x) A Isg(x)), for f#g 
Analogous rules hold for term algebra of leafsets: 
Va. Vo Isp (x) 
fEDU{PRI} (91) 


Va. a(Is p(x) A Iso (x), for f #g 
Term algebra of shapes satisfies the original rules of 
term algebra. 


6.2. A Logic for Term-Power Algebras 


To show the decidability of the first-order theory of the 
structure FT. with operations in F igure [8] we show decid- 
ability for a richer structure. F igure [9|shows the operations 
and relations of this richer structure. 

The structure has four sorts: bool representing truth val- 
ues, term representing terms, shape representing shapes, and 
leafset representing sets of leaves within a given shape. The 
structure can be seen as as a combination of the operations 
of Figure [5] and Figure [2| 

For each relation symbol r € R we define a relation sym- 
bol r* of sort shape x term® — bool acting on terms of the 
same shape. While in Section [4.2] we associate a boolean 
algebra with the terms of same shape, in this section we 
associate a cylindric algebra [21] with terms of the same 
shape. This is a particularly simple cylindric algebra re- 
sulting from lifting first-order logic on the base structure 
C so that elements are replaced by terms of a given shape 
(which are isomorphic to functions from leaves to elements), 
and boolean values are replaced by sets of leaves (isomor- 
phic to functions from leaves to booleans). In both cases, 
operations on the set X are lifted to operations on the set 
leaves(s) — X. Syntactically, we introduce a copy of all 
propositional connectives and quantifiers: A'!, V', -!, true!, 
false!. Like boolean algebra operations in Figure |5} these 
syntactic constructs in Figure 9] take an additional shape 
argument, because term-power algebra contains one copy of 
a strong power C” of base structure for each shape. We call 
formulas built using the operations of the cylindric algebra 
inner formulas. 


per-shape product structure 
inner formula relations for r € Lo: 
r_ i: shape x term* — leafset 


inner logical connectives: 


Avi shape x leafset x leafset — leafset 
at leafset — leafset 
true! , false! leafset 


inner formula quantifiers: 


Ww 


'Vl os: shape x (term — leafset) — leafset 
leafset equality: 

=' :: leafset x leafset — bool 
leafset cardinality constraints, k > 0: 
jel cke [2] =k shape x leafset — bool 
leafset quantifiers: 
zt yt 


term equality: 


(leafset — bool) — bool 


= : term x term — bool 


term quantifiers: 


4,V oo: (term — bool) — bool 
shape equality: 

=* :: shape x shape — bool 
shape quantifiers: 
3, v* 


logical connectives: 


(shape — bool) — bool 


A.V :: bool x bool — bool 
= : bool > bool 
true, false, undef :: bool 


term algebra on terms 
constructors, f € U: 

foo: term® — term 
constructor test, f € XU: 

Isf os: term — bool 
primitive-term test: 
Ispp; i: term — bool 
selectors, f € XU: 

fi ou term — term 
term shape: 


sh :: term — shape 


term algebra on leafsets 
constructors, f € U: 

ft os: leafset® — leafset 
constructor test, f € XU: 

Isp. :: leafset + bool 

primitive-leafset test: 
Isppt 2: leafset — bool 
selectors, f € Ui: 

fr oo: leafset — leafset 
leafset shape: 


Issh_ :: leafset — shape 


term algebra on shapes 
constructors, f € Xo: 
f% =: shape* — shape 
constructor test, f € No: 
Isfs 2: shape — bool 
selectors, f € Ui: 


fs: shape > shape 


Figure 9: Operations and relations in structure P 
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For each operation in Figure [2] there is an operation in 
F igure [9] potentially taking a shape as an additional argu- 
ment (for operations used to build inner formulas). The logic 
further contains term algebra operations on terms, leafsets, 
and shapes. 

We use undecorated identifiers (e.g. w) to denote vari- 
ables of term sort, variables with superscript S to denote 
shape variables (e.g. u*) and variables with superscript L to 
denote leafset variables (e.g. u'). 

Figures and show the semantics of logic in Fig- 
ure] The first row specifies semantics of operations in the 
case when all arguments are defined and are in the domain 
of the operation. The domain of each operation is in the 
second column, it is omitted if it is equal to the entire do- 
main resulting from interpreting the sort of the operation. 
All operations except for plain logical operations and quan- 
tifiers over the bool domain are strict. Logical operations 
and quantifiers over the bool domain are defined as in the 
three-valued logic of Section [2.3] 

We remark that values of leafset act as terms with two 
constants in F igure [5] In fact, if the base structure C has 
only two constants then the formula x = a and its proposi- 
tional combinations are sufficient to express all facts about 
C, so in that case there is no need to distinguish between 
terms and leafsets. 


6.3. Some Properties of Term-Power Structure 


In this section we establish some further properties of the 
term-power structure, including the homomorphism proper- 
ties between the term algebra of terms and the term algebra 
of leafsets. We also argue that it suffices to consider a re- 
stricted class of formulas called simple formulas. 

Recall that r~ € R is the equality relation on C. Given 
r, we can express the equality between terms by: 

ti = te 


——4 r—'(t1, te) 


sh(tz2) = sh(t1) Ar (t1, t2) = truesats) 
(92 
We define the notion of a u*-term as in Definition 
except that we use different symbols for boolean algebra 
operations. 


= 


Definition 72 (u’-terms) Let u*° € Var be a shape vari- 
able. The set of u°-terms Term(u®*) is the least set such that: 


1. ub € Term(u’) for every leafset variable u'; 
2. falsel,s, trues € Term(u’); 
3. if tt, ts © Term(u’), then also 
th Ays ts € Term(us), 
th Vis th € Term(u’), and 
al sth € Term(us) 
If ¢* is a term of shape sort, the notion of ¢°-inner formula 
is defined as follows. 


Definition 73 (u°-inner formula) Let u° € Var* be a 
shape variable. The set of u’-inner formulas Inner(u*) is 
the least set such that: 
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1. if ui,...,UzR are term variables and r € Loc such that 
ar(r) =k, then 


Tus(U1,.--,Uk) € Inner(u*) 


2. falsel,s, truel,s € Inner(u*) 

3. if 61,62 € Inner(u>) then also 
d1 Ays b2 € Inner(us) 
o1 Vius b2 € Inner(us) 


ais1 € Inner(us) 


4. if @ € Inner(u®) and u is a term variable that does not 
occur in u°*, then also 


usu.d € Inner(us) 


LW 


Visu.d € Inner(us) 


If @ € Inner(u’) and w1,...,Un is the set of free term vari- 
ables of ¢, we write d(u’,ui,...,Un) for ¢. Furthermore, if 
t° is a term of shape sort and ti,...,tn terms of term sort, 
we write $(t*,ti,...,tn) for 


glu := Fur i= ti,..., 


where we assume that variables bound by S' and V' are 


renamed to avoid the capture of variables that are free in 


ee 

We call O(t*,t1,...,tn) an instance of the u’-inner for- 
mula o(u®,u1,...,Un)- 

Tf b(us,ui,...,Un) is an inner formula, we abbrevi- 


ate it by writing [d'(u1,...,Un)]us where $’ results from 
(ue, U1,.--,Un) by omitting the shape argument u> from 
the operations occurring in (u*,u1,...,Un). Similarly, we 
write [b'(t1,..-,tn)les for &(t*, t1,...,tn)- 


According to the semantics in Figure [10] sh is a homo- 
morphism from the term algebra of terms to the term alge- 
bra of shapes. In addition, Issh is a homomorphism from the 
term algebra of leafsets to the term algebra of shapes. 

We also have the following important property. Let r € 
Lc be a relation symbol of arity n, let f € © be a function 
symbol of arity k, and let 


sh(ti;) — od sh(tnj;) = 85 


for 1 <j <-k. If fS = shapified(f), f' = leafified(f), and 


8s = f*(s1,..., 8) then 
Taf (bry. 2.5 tim) 52+ sq (bnay-s-5tak)) => 
(93) 
f' (rs; (t11,-.-;tn1),--- 51s, (tik, ---,tnk)) 


Furthermore, if Issh(/;) = Issh(l;) = s; for 1 < j < k and 


interpretation of sorts 


[term] = FT(»’) 
[shape] = FT(%Xo) 
[leafset] = {(s,L) | L C leaves(s)} 
[bool] = {true, false, undef} 
semantics 


well-definedness 


inner formula relations for r € Lo: 


Ir](s,t1,.-.,th) = 
inner logical connectives: 
[A'I(s, (s1, £1), (82, L2)) = 
[V'I(s, (s1, £1), (s2,L2)) = 

[-'I(s, (s1,L1)) = 

[true'](s) = 
[false'](s) = 


inner formula quantifiers, for h : term 


Ww 


[F'l(s, A) 
IV'(s,.k) = 

leafset equality: 

[="]((s1, L1), (s2,L2)) = 


(s, {| Dr] tld), ---, tell))}) 


8,11 Le) 
8, Ly U Le) 
) 
) 


8, leaves(s) \ L1) 


ee RS, 


8, leaves(s)) 

(s,0) 

— |leafset]: 

(s, U{L | St € [term]. sh(¢) = s A h(t) = (s, L)}) 
(Q{E | dt € [term]. sh(¢) = s A h(t) = (s, L)},) 


81 = 582A Ll, =L2 


leafset cardinality constraints: 


[l(s1,L1)|s 2k] = 
[l(s1,L1)|s =k] = 
leafset quantifiers, for h : 
Bh = 
[V]h = 


term equality: 


[=] 


,t2) = 


(|L£i| = k) 
(|Li| =k) 


leafset] — [bool]: 


(s, t) € [leafset]. h((s, t)) 
V(s,t) € [leafset]. h((s, ¢)) 


Ww 


(t1 = te) 


term quantifiers, for h : [term] — [bool]: 


ah = 
VJh = 


shape equality: 


[="] (ti, t) 


t € [term]. h(t) 
Vt € [term]. h(t) 


WwW 


(ti = t3) 


shape quantifiers, for h : [shape] — [bool]: 


[3]h = 
[Vn = 


WwW 


t € [shape]. h(E) 
Vt € [shape]. h(£) 


sh(t;) =sA...Ash(tx) =s 
81: =S/AS82=8 


Si =S/AS82=8 


si =S 


Vt € [term]. Issh(h(t)) = s 
Vt € [term]. Issh(h(t)) = s 


si =—S 


Figure 10: Semantics for Logic of Term-Power Algebra (Part I) 
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semantics well-definedness 


term algebra on terms 
constructors, f € U: 
LflGaj.ceste) So FA, 3 te) 
constructor test, f € U: 
[Iss](t) = 
primitive-term test: 


[Ispri](t) = (t€C) 


Ww 


ti,...,tk. t= f(ti,..., tr) 


selectors, f € U: 
[f(t = Eb. t= f (tijengtiseate) [Isr] (¢) 


term shape: 


[sh(f(ti,---,tn)) 


shapified(f)(sh(t1),...,sh(tn)) 
term algebra on leafsets 
constructors, f € U: 
[f*]((s1, L1),---, (se, Le)) = (f(s1,---, 8x), ({1}- £1) U---U ({k} - Le)) 


constructor test, f € U: 


$1,1n,. : ., Sk, Lr. (s, L) = [fT (si, L1), weed (sk, Le)) 


Ww 


[Isp ]((s,£)) = 
primive-leafset test: 
[lsprn}((s,L)) = (s=c?) 


selectors, f € U: 
Lfr ((s, L) = €(si, Li). (s, L) = [f']((s1, L1),---, (si, Le), -.-, (8x, De)) [Isp] ((s, £)) 


leafset shape: 


[Issh] ((s,L)) = 8 
term algebra on shapes 
constructors, f € &: 
[fi](si,.--,5e) = fP(si,..., 8x) 


constructor test, f € No: 


[Isys](s) = dsi,...,5e. 8 = f*(s1,..., 5%) 
selectors, f € U: 


[fil(s) [!sy5](s) 


II 
nm 
w 
= 
w 

II 

Ss 
a 
~~ 
w 
me 
@ 
BY 
wm 
> 
YS 


Figure 11: Semantics for Logic of Term-Power Algebra (Part II) 
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s = f°(si1,..., 8x) then 


fi(,...sle) AU F(A, ) 
fA ese 
fi(h,...slk) Vb fo. Ue) tb 
Pa De Tie leDan ty) 
al ft(h,...,le) = 
P Cigligee me 
pt. ft (hi(t),...,he(t)) =" 
Fp ER Dye cay a eat) 
Vit. f'(hi(t),---,Ax(t)) =! 
PV elit Do aay Cyt) 


ae 


From these properties by induction we conclude that if 
o(u’,u1,.--,Un) is an inner formula, then 


b(s, f(ti1,..., tie)... af (tnt, =--;tnz)) — 
T° (G Biy brigcs 2 te) te (Sk, Bigs 


Let (u*,ui,...,Un) be an inner formula and_ let 
¢'(ui,...,Un) be a first-order formula that results from re- 
placing operations A\,, V4, 7, V4, 3) by A, V,7, V, a. Inter- 
preting ¢’(ui,..., Un) over the structure C yields a relation 
p CC”. If 


95 
ee. (95) 


sh(ti) =... =sh(tr) =s 


then 


[¢](s, t1, sen tk) = (s, {l | p (ta (l], pas: ,tx[l])}) 


The following Definition [75] introduces a more restricted 
set of formulas than the set of formulas permitted by sort 
declarations in Figure [9] We call this restricted set of for- 
mulas simple formulas. One of the main properties of simple 
formulas compared to arbitrary formulas is that simple for- 
mulas allow the use of operations 3!,V!, and relations r_, 
r € Le only within instances of u®-inner formulas. 


Definition 74 A simple operation is any operation or re- 
lation in Figure|9| except for operations 3',V', and relations 
r_forr € Le. 


Definition 75 (Simple Formulas) The set of simple for- 
mulas is the least set that satisfies the following. 


1. if d(u’,u1,...,Un) is a an inner formula, t° a term of 
shape sort, t1,...,tn terms of term sort and ub is a 
leafset variable, then 


ui =" d(t§, t1,...,tn) 


is a simple formula. 


2. applying simple operations to simple formulas yields 
simple formulas. 


Example 76 A formula 


(96) 


u= Aye tu. Tus (u, u) 
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is not a simple formula for uj 4 us. Formula 


=u Aub st Aig U. rus (u,u)) V 


# uy / undef) 


(ui 
(uj 
is a simple formula equivalent to formula (96). We abbrevi- 


ate Aus u. Tus (u, U) as (alu. r(u, U))us - 
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Lemma shows that for every formula in the logic of 
Figure 9] there exists an equivalent simple formula. Note 
that even simple formulas are sufficient to express the re- 
lations of structural subtyping. A reader not interested in 
the decidability of the more general logic of Figure [9] may 
therefore ignore Lemma|?7| 


Lemma 77 (Formula Simplification) For every well- 
defined formula in the logic of Figure[9| there exists an equiv- 
alent well-defined simple formula. 


Proof Sketch. According to the definition of simple for- 
mula, we need to ensure that every occurrence of quantifiers 
V', 4! and relations r_ is an occurrence in some inner-formula 
instance $(t*,t1,...,t). Each occurrence r¢s(ti,...,tn) is 
an inner formula instance by itself, so the main difficulty is 
fitting the quantifiers V' and 3! into inner formulas. 

Let us examine the syntactic structure of formulas of 
logic in Figure [9] This syntactic structure is determined 
by sort declarations. Each expression of leafset is formed 
starting from 


1. relations r € Lo; 
2. leafset variables; 


3. true! , false! 


using operations A', V!, -!, V', d!, as well as f' and ft. The 


leafset expressions can be used in a formula in the following 
ways (in addition to constructing new leafset expressions): 


1. to compare for equality using ='; 

2. to test for the top-level constructor using Is; 
3. to form leafset cardinality constraints; 

4. to form a shape using Issh. 


Because the top-level sort of a formula is bool, every 
term th of sort leafset occurs within some formula tj =! t5 
or Ispi(t'), [tls = k, |t'ls > k& or as part of some term 


Issh(t'). We can replace Is (¢") with 


L 


3 Lec bg. 4f i 
due.ur =" tA" Isp (ur) 

according to Lemma [10] so we need not consider that case. 
We can similarly eliminate non-variable leafset terms from 
cardinality constraints. If a leafset term té occurs in an 
expression Issh(t'), we consider the smallest atomic formula 


w(Issh(t')) enclosing Issh(t'), and replace w(t!) with 


LW 


wit at tt a! w(u') 


This transformation is valid by Lemma [10] because yw and 
Issh are strict. 


We further assume that in every atomic formula ty =! tb, 
the term th is a leafset variable. 

Suppose that a term t' in a formula ut =" t' is not an 
instance of an inner formula. Then there are two possibili- 
ties. 


1. There are some occurrences of leafset term algebra op- 
erations f', fy or leafset variables ut in t'. Here by 
“occurrence” in tt we mean occurrence that is reachable 
without going through a shape argument or a relation, 
but only through operations V'!,3!, A!,V!,7!. For ex- 
ample, we ignore the occurrences of f', ft within terms 
§ that occur in Abs. 


2. not all shape arguments in V',3', <Al,v!,-l, 
true!,false!, r_ occurring in tb are syntactically iden- 
tical. 


We eliminate the first possibility by propagating leafset 
term algebra operations f", ft inwards until they reach ex- 
pressions of form Lt*(ti,...,tn), applying the equations 
from left to right. We then convert f', ff operations of 
term algebra of leafsets into operations of the term algebra 
of terms applying from right to left. 

To eliminate the second possibility, let ¢{,...,¢}, be the 
occurrences (reachable through true! , false’, A', v', -!, 
vi, 3!) in term ¢' of the shape arguments of operations 
true!,false!, A', v!, a!, V', S!. Then replace 


u =t(t,...th) 
with 


(Bus. WSE(uk = BY) Al... AWE (ud = ESA’ 


ub =" thu’, ...,u5)) V 
(undef A Viciej<n ti At) 


Here VSt denotes universal quantification Vuia,-- 


og Ui,n; 
where Ui,1,---,Ui,n,; is a list of those term variables occur- 
ring in ¢§ that are bound by some quantifier 3! ,V' within t'. 


6.4 Quantifier Elimination 


In this section we give a quantifier elimination procedure for 
the term-power structure. The procedure of this section is 
applicable whenever C is a structure with a decidable first- 
order theory. 

Definition [78] below generalizes the notion of structural 
base formula of Definition Section There are two 
main differences between Definition [41] and the present Def- 
inition 

The first difference is the presence of three (instead of 
two) base formulas: shape base, leafset base, and term base. 
This difference is a consequence of the distinction between 
leafsets and terms and is needed whenever base structure 
C has more than two elements. There is a homomorphism 
formula relating leafset base formula to shape base formula 
and a homomorphism formula relating term base formula to 
shape base formula. Furthermore, some of the leafset vari- 
ables are determined by term variables using inner formula 
maps, which establishes the relationship between term base 
formula and leafset base formula. Cardinality constraints 
now apply to leafset variables. 
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The second difference is the distinction between com- 
posed and primitive non-parameter leafset and term vari- 
ables. A composed non-parameter variable denotes a leafset 
or a term whose shape s has property Is+s(s) for some f € . 
A primitive non-parameter variable denotes a leafset or a 
term whose shape is c and has property Ispri or Ispat. The 
purpose of this distinction is to allow cardinality constraints 
and inner formula maps not only on parameter variables, 
but also on primitive non-parameter variables, which is use- 
ful when the base structure C is decidable but infinite. 


Definition 78 (Structural Base Formula) 
A structural base formula with: 


e free term variables 41,...,%mj} 

e internal composed non-parameter term variables 
U1,--+,Ur; 

e internal primitive non-parameter term variables 
Ur+1,+++,Up; 

e internal parameter term variables Up+1,.--,Up+q) 


L 


e free leafset variables xk,... Les 


e internal composed non-parameter leafset variables 


L Les 
Ws 8.8035 UpLs 
e internal primitive non-parameter leafset variables 
L LA. 
UpLays ae -) UpLs 
e internal parameter leafset variables Upi41, +++ ;Upi4gt3 
e free shape variables x},..., 25,5; 
y 2 s s 
e internal non-parameter shape variables uj, ..., Ups; 
e internal parameter shape variables ups, ...,Ups+qs 
is a formula of form: 
qo L L s s 
Miss «ey hig ty o<s y Unt, U1,+++5 Uns. 
shapeBase(uj,..., Uns, 21,---;2ins) A 
L iE. sk L 
leafsetBase(uz,...,U71,01,.-.,2,1) A 
Lb L s s 
leafsetHom(ujz,...,UpL,UT,---,Uns) A 
termBase(ui,...,Un,21,---,;2m) A 
termHom(u1,...,Un,U{,---+,Uns) A 
di L L s s A 
cardin(tiy yy +++; Wnts Upsyis +++) Uns) 
7 L L s s 
innerMap(tr41, +++) Uns Uptg yy +++) Uns Upsqis +++ Uns) 


wheren = p+q, nt =pt+q', n§ = p+, and formu- 
las shapeBase, leafsetBase, termBase, leafsetHom, termHom, 
cardin, innerMap are defined as follows. 


shapeBase(uj,...,Uns,21,---,2ms) = 
p m5 
Cane ae ge s es 
uj =ti(uy,...,Uns) A A y= uj, 
i=l 


i=l 
A distinct(u,,..., uy) 


where each ti is a shape term of form f*(uj,,...,ui,) for 
some f € Xo, k =ar(f), andj: {1,...,m®} — {1,..., n°} ts 


a function mapping indices of free shape variables to indices 
of internal shape variables. 


leafsetBase(ut,... tbe; vh,... wh 1) — 
. L L L 
Uji = ti(ur,. Unt) A 
w=1 
pt 
A ISpa (us) A 
j=rl41 
Bg oly 
i=1 


where each t; is a term of form f(ui,,...,Us,) for some 
f € 5, k=ar(f), andj: {1,...,m'} — {1,...,n'} is a 
function mapping indices of free leafset variables to indices 
of internal leafset variables. 


nt 


leafsetHom(uy,..., ur, Ul, ++) Ums) = A Issh(uy) = Uj, 
i= 
where j : {1,...,n'} — {1,...,n5} is some function such 
thet TigecssIehk = LlecxsP Od (igre I ecg S 
{p°+1,...,p°+q°} (a leafset variable is a parameter variable 
iff its shape is a parameter shape variable). 
termBase(u1,..., Un, 21,;---,;2m) = 
r 
A uj = ti(u1,...,Un) A 
i=l 
Pp 
mn Ispri(us) \ 
i=r+l1 
m 
A Ti = Uj; 
i=1 


where each t; is a term of form f(ui,,.-.,Us,) for some 
f¢€u, k =ar(f), andj: {1,...,m} > {1,...,n} is a 
function mapping indices of free term variables to indices of 
internal term variables. 

4 Un, Ui, <+25 Uns) = 


nm 
termHom(ut,.. A sh(us) = v5, 
i=1 


where j : {1,...,n} — {1,...,n°} is some function such 


that {ha sas Jp} c {1, aa »p} and {jpt1 anos ,Iptat Cc {p a 
1,...,p°+q°} (a term variable is a parameter variable iff its 
shape is a parameter shape variable). 


cardin(urt, 1, ots tig Apel, ees Uns) = U1 A+++ A Wa 
where each w, is of form 


le itags 8 15 Unt) lw = k 


or 
|t' (ura, rn) Unt) |us 2 k 


for some us-term t'(ut ..,Ur.) that contains no vari- 


UpL4y) 


ables other than some of the variables ti ais — Ue, and 
the following condition holds: 

If a variable Us forr'+1<j<n! occurs in 

the term binds ..,UrL), then Issh(uj) = (97) 


u> occurs in formula leafsetHom. 
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innerMap(ur41,.--; Un; Lae re Ue Ups41) +++, Uns) = 
m A+++ Ae 
where each 7; is of form 
uj = d' (u’, Wiz +++) Usp) 
for some inner formula $'"(uS,ui,,...,Ui,) © Inner(u’) 
where L+1 < j < nie. uy is a primitive 


non-parameter leafset variable or parameter leafset vari- 
able, {ti,,..-,Ui,} CG {Urqi,.-.,Un} are primitive non- 
parameter term variables and parameter variables, the con- 
junct Issh(u') = uS occurs in leafsetHom, and the following 
condition holds: 


sh(ui;) = u* occurs in formula termHom for 


every j where 1 <j <k. (98) 


We require each structural base formula to satisfy the fol- 
lowing conditions: 


PO) the graph associated with shape base formula 


quy,...,Uns. shapeBase(uy,..., Uns, 21,---;Lms) 


is acyclic (compare to Definition |2i); 


P1) congruence closure property for shapeBase subformula: 
there are no two distinct variables uj and uj such that 
both uj = f(uj,,.--,uj,) andus = f(uj,,..-,uj,) occur 


as conjuncts in formula shapeBase; 


P2) congruence closure property for leafsetBase subformula: 


there are no two distinct variables ut and uy such that 

Lo gly L L gly L 
both uz = f-(ur,,..-,ur,) and uj = f-(ur,,---,u7,) 
occur as conjuncts in formula leafsetBase; 


P8) congruence closure property for termBase subformula: 
there are no two distinct variables ui and u; such that 
both ui = f(ui,,.--,w,) anduj = f(ui,,...,Ur,) occur 


as conjuncts in formula termBase; 


P4) homomorphism property of \Issh: for every non- 
leafset variable ub such that ub = 


Us, ) occurs in leafsetBase, if conjunct 


parameter 
F Vies 5 tes 
Issh(u') = u® occurs in leafsetHom, then for some shape 
variables uj,,...,uj, termu® = f*(uj,,...,uj,) occurs 
in shapeBase where f* = shapified(f) and for every r 
where 1 <r < k, conjunct Issh(ui,) = uj, occurs in 
leafsetHom. 


P5) homomorphism property of sh: for every non-parameter 
term variable u such that wu = f(ui,,...,Ui,) Oc- 
curs in termBase, if conjunct sh(u) = u° occurs in 
termHom, then for some shape variables uj,,...,Uj, 
term u° = f*(uj,,..-,Uj,) occurs in shapeBase where 
f° = shapified(f) and for every r where 1 <r < k, 


conjunct sh(ui,.) = uj, occurs in termHom. 


As in Section[3.4Jand Section|4.3}we proceed to show that 
each quantifier-free formula can be written as a disjunction 
of base formulas and each base formula can be written as 
a quantifier-free formula. We first give a small example to 
illustrate how the techniques of Section extend to the 
more general case of U-term-power. 


Example 79 We solve one subproblem from Example 
using the language of term-power algebras. 
Consider the formula 


dv. g(v,z) < g(z,v) A Isg(v) A Isg(w) A 
a(gi(w) < gi(v)) 


Formula (99) is in the language of Figure[8| with < a binary 
lifted relation. After converting into the language of 
F igure [9] we obtain as one of the possible cases formula: 


(99) 


av. 


[9(v, z) X g(z, v)]en(a(zv)) =" truetncg(z,o)) A 
sh(g(z,v)) =" sh(g(z,v)) A 


Isg(v) A Isg(w) A (100) 


[g1(w) X gi(v)]sh(ar(w)) Fi truesn cg, (wy) A 
sh(gi(v)) =* sh(gi(w)) 


where ~ is the subtyping relation on the base structure C' so 
that < = =’. We next transform the formula into unnested 
form, obtaining: 


eae L. 
Uyz, Uwl- 


LW 


= s,s s 
U, Uvz, Uzv, Uwl1;, Uv1- Uyz, Uwl 


WwW 


Uvz = G(v,z) A Us = G(z,v) A 

Uwi = gi(w) A ui = gi(v) A 

Une = Sh(tvz) A uUtyr =" sh(twi) A 

sh(uzv) =" uy. A sh(uo1) = ui, A 

Isg(v) A Isg(w) A nN) 
tye" [Bons tase, 
|nupzluz, = OA 


oe 
Uwi = 


E [Uw1 x Uvilus,, x 

|nuei| > 1 
We next transform (101) into disjunction of base formulas. 
A typical base formula is: 


dUyz, Uz, Uv, Uz, Uw, Uv1, Uv2, Uz1, Uz2, U1; Uw2- 


Lek eee Seer L LL L L 
Uyz, Uy, Uz, Uy, Uv2, Uz1, Uz2; Uw: 


WwW 


Pui 2, Urns Ur, U2 
shapeBase, / (102) 
leafsetBase; A leafsetHom, A 
termBase; A termHom, /A 
carding A innerMap, 
shapeBase, = wy, = g°(Uy, Uw) A Uw = (U1; we) A 
distinct(us,., Ui, Wiy1, Use) 
leafsetBase; = ub. = g'(ub,ub) A 
Uy = G9" (Usi, Ure) Aus = gi (war, Wee) 
leafsetHom; = Issh(ub,) = us. A 


( 
( 

Issh(uh1) = us. A Issh(ub2) = uso A 
(ut,) = uS,, A Issh(uts) = uS,o A 
( 
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termBasey = Uyz = g(Uv, Uz) A Uv = g(Uz, Uv) A 
Uy = G(Uv1, U2) A Uz = g(Ue1, Usa) A 
Uw = g(Uw1,Uw2) A 
Z= Uz NW = Uw 

termHom,; = 


sh(tvz) = Uy2 A sh(uzv) = Uys A 

sh(uy) = ur, Ash(uz) = ui, A sh(uw) = ur, A 
sh(tio1) = Uy A sh(u21) = usp A Sh(tw1 
sh(tiv2) = Una A sh(uz2) = Uys A sh(uwe 
innerMap, = 


U1 a [wer s Uzilus,, A U1 = [wei at Uvilus,, x 


L L 


ube = [Ue X Uz2|us, A use =" [uze X Uvelus,, A 


2 


ut — [Uw x Uvilus,, 


carding = |wsilus,, =0 A |ruzilus,, =0A 


| Uy 2|us,5 =0A | Uz2|us,,5 =O0A 
|nunrlus,, | 21 


We next show how to transform the base formula into 
quantifier-free form. 

We substitute away non-parameter term variables 
Uyz,Uzy,Uy and non-parameter leafset variables ub,, ub, ub, 
because the homomorphism constraints they participate in 
may be derived from the remaining conjuncts. We next elim- 
inate parameter term variables uy1, Uv2 and parameter leaf- 
set variables ub,,ubo, uty, ub., ub. Grouping the conjuncts 
in cardin; and innerMap, by their shape, we may extract the 


subformulas 71 and ~%2 of (102). 
w= 


Juy1.a¢ 


L L L 
Uy1; Uz1; Uw: 


sh(to1) =" Uti A sh(uz1) = ur A sh(uw1) =" uty A 
Issh(ut,) =S us, A Issh(ub,) =§ us, A 
Issh(ut,,) =" us. A 


L 


Uy1 =t [uv s Uzilus, A Us1 =- [wer = Uvilus,, A 


1 
ti. [Uw1 x Uvilus,, A 
rubies, =0 A rubilen, = 0A 
|rutiales, | 21 

and 


Lb LE 
Uy2; Uz2- 


wi 


py 


Uy2- 


sh(uv2) =" U2 A sh(uz2) =* wine A 
sh(us) —d Usp? A sh(us2) — Usy2 A 


U2 =. [uve x Uz2Jus,5 A Use = [wz2 x Uv2]us,5 x 


|nubelus,, =0A | Uz2lus,» = 0 


Formula v1 expresses a fact in a structure isomorphic to 
the power C” where n is the number of leaves in the shape 


denoted by u%,;. Similarly, w%2 expresses a fact in a prod- 
uct structure C™ where m is the number of leaves in the 
shape denoted by u%,2. We can therefore use the technique 
of Feferman-Vaught technique Gra to eliminate the 
quantifiers from formulas ~ and w2. According to Exam- 


ple [17] a1 is equivalent to: 


par | Bo L 
a Ug; U4. 


up =" (lt. t<~ua A) ua Xt A) ui X tls, A 


= 
Wud 


L 
ug =" (Bt. tx un A) ua xt A' suv X tus, A 


july, 21A Hub Al lables, =0 


We similarly apply Feferman- Vaught construction to wz and 
obtain the result true. We may now substitute the results of 
quantifier elimination in 7 and 72. The resulting formula 
is: 


Ww 


Uvz, Uzv; Uv, Uz, Uw, Uv1, Uv2, Uz1, Uz2, Uwl, U2: 


Ele Look Ok LE L I L 
Uyz, Uy, Uz Uy1; Uv2; Uz1, Uz2; Uw: 


WwW 


s,s s 
Uyz, Uw 


WwW 


1 U1; Ua: 
shapeBase, A 

leafsetHom2 A 

termBasez2 A termHom2 A 


cardiny A innerMap, 


where 
leafsetHom2 = Issh(up) = us, A Issh(wa) = us, 
termBaseg = uz = g(tiz1,Uz2) A Uw = g(Uwi, Uw2) A 
Z= Uz NW = Uw 
innerMap, = 


up = (lt. t<~ua A) ua xt A) ui X tus, A 


s 
Uw 


L 
L _L sal I I I 
ug = (At. tua AN wart A ui X thus, A 


carding = |uglus,, 21 A |o'ug A! aluglus,, = 0 


In the resulting formula all variables are expressible in terms 
of free variables, so we can write the formula without quan- 
tifiers 3,V,a', Vv". 


4 


The following Proposition is analogous to Proposi- 
tion [44] the proof is straightforward. 


Proposition 80 (Quantification of Struct. Base) If G 
is a structural base formula and x a free shape, leafset, or 
term variable in 3, then there exists a base structural for- 
mula 31 equivalent to Ax.(. 


The following Proposition[81]corresponds to Proposition[45] 


Proposition 81 (Quantifier-Free to Structural Base) 
Let @ be a well-defined simple formula without quantifiers 
tbovt, A.V, 3,V’. Then can be written as true, false, or 
a disjunction of structural base formulas. 
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Proof Sketch. The overall idea of the transformation to 
base formula is similar to the transformation in the proof of 
Proposition[45] Additional complexity is due to inner formu- 
las. However, note that an inner formula $(u*,wi,...,Un) 
is well-defined iff 6(u*, ui,...,Un) holds where 


s 


J(u’, ui,---,Un) = sh(u1) =ue A...A sh(un) =u 


Hence, each formula ¢(u*,u1,...,Un) can be treated as a 


partial operation p of sort 
shape x term” — leafset 
and the domain given by 
Dy = ((u*,u1,.-- ,Un)) 


This means that we may apply Proposition [9| and convert 
formula to disjunction existentially quantified well-defined 
conjunctions of literals in one of the following forms: 


,Un), 0(u’, U1,.-- 


1. equality with inner formulas: ug =! ¢(u‘,u1,...,Un) 


where ¢(u*, ui,...,Un) is a u®-inner formula; 
2. formulas of leafset boolean algebra: 


L,>La,l L 


ub = uz Nuys UZ 
ub =| ut Vis us 
up = ys 
ub =! truel,s 
ub =! falsel,. 


3. formulas of term algebra of terms: 


U1 = U2, U1 # ug 


uo = f(ui,.-.;Un) 
u = fi(uo) 
Is¢(uo), als ¢(wo) 
sh(u) = ue 


4. formulas of term algebra of leafsets: 


Ek ke kk 
Uy = U5, UT FA U5 
LL ply bk L 
Ug = f (uz, 1 Un) 
L 


5. formulas of term algebra of shapes: 
Ss s s s s M3 
uy =" uy, ul F uy 
$s $s Ss s 
ug =" fe(ui,--- 
Ss _S fS/,/s 
ue =* f? (ua) 


Isps(ug), —lsys (a) 


»Un) 


We next describe transformation of each existentially 
quantified conjunction. In the sequel, whenever we perform 
case analysis and generate a disjunction of conjunctions, ex- 
istential quantifiers propagate to the conjunctions, so we 
keep working with a existentially quantified conjunction. 
The existentially quantified variables will become internal 
variables of a structural base formula. 

@ ain 7; to the proof of Proposition we use 
to eliminate literals —Is;(uo), als pf (uo), 

Tey 

As in the proof of Proposition [45] we replace formulas of 
leafset boolean algebra by cardinality constraints, similarly 
to Figure 

We next convert formulas of term algebra of terms into 
a base formula, formulas of term algebra of leafsets into a 
base formula, and formulas of term algebra of shapes into a 
base formula. 

We simultaneously make sure that every term or leafset 
variable has an associated associated shape variable, intro- 
ducing new shape variables if needed. 

We also ensure homomorphism requirements by replac- 
ing internal variables when we entail their equality. 

Another condition we ensure is that parameter term vari- 
ables map to parameter shape variables, and non-parameter 
term variables to non-parameter shape variables; we do this 
by performing expansion of term and shape variables. 

We perform expansion of shape variables as in Sec- 
tion Expansion of term and variables is even simpler 
because there is no need to do case analysis on equality of 
term variable with other variables. 

ve eliminate disequality between term variables us- 
ing We eliminate disequalities between leafset vari- 
whet as in in Example [43] by converting each disequality into 
a cardinality constraint. Elimination of disequalities might 
violate previously established homomorphism invariants, so 
we may need to reestablish these invariants by repeating the 
previously described steps. The overall process terminates 
because we never introduce new inequalities between term 
or leafset variables. 

As a final step, we convert all cardinality constraints into 
constraints on parameter term variables, using (95). 

In the case when the shape of cardinality constraint is c*, 
we cannot apply eh However, in this case, unlike Propo- 
sition we do not do case analysis on all possible constant 
aeea ee is not even possible in general). This is because 
Definition[78] unlike Definition [4]]implies no need to further 
decompose cardinality constraints in that case, because we 
allow primitive non-parameter leafset variables. 

This completes our sketch of transforming a quantifier- 
free formula into disjunction of structural base formulas. m= 


We introduce the notion of determined variables in struc- 
tural base formula generalizing Definition and Defini- 
tion 

For brevity, we write u* for internal shape, term, or leaf- 
set variables, similarly x«* for a free variable, t* for a term 
and f* for a shape, term, or leafset term algebra constructor 
and f; for a shape, term, or leafset term algebra selector. 


Definition 82 The set determinations of variable determi- 
nations of a structural base formula 3 is the least set S of 
pairs (u*,t*) where u* is an internal term, leafset, or shape 
variable and t* is a term over the free variables of 3, such 
such that: 
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1. if «* = u™* occurs in termBase, leafsetBase, or 
shapeBase, then (u*,a2*) € S; 
2. if (u*,t) € S and u® = f*(ut,...,uz) oc- 


curs in shapeBase, termBase, or leafsetBase then 


{(ui, fr (t")),.- + (uk, fa (t")) } SS; 
3. if {(ui, fi (t")),.--, (uk FeO ))} GC S and um = 


f*(uj,...,uz) occurs in shapeBase, termBase, or 
leafsetBase then (u*,t*) € S; 


4. if (u,t) € S and sh(u) = u® occurs in termHom then 
(u®, sh(t)) € S; 


5. if (u',t') € S and Issh(ub 
then (u, lssh(t')) € S; 


) = u® occurs in leafsetHom 


6. if ub = O(us,u,...,un) occurs in innerMap 
where (u’,ui,...,Un) % an inner formula 
and {(u®,t°), (ur, t1),-.-, (Un, tn)} CC SS, then 
(ul, d(t8,t1,---,tn)) € S. (In the special case 
when @ contains no free term variables, if (u°,t®) € S 
then (u', o(us)) € S. 


Definition 83 An internal variable u* is determined if 
(u*,t*) € determinations for some term t®. An internal vari- 
able is undetermined if it is not determined. 


Lemma 84 Let ( be a structural base formula with ma- 
triz Bo and let determinations be the determinations of G3. If 
(u*,t*) € S then — Bo > u* =t*. 


Proof. By induction, using Definition [82| : 


Corollary 85 Let @ be a structural base formula such that 
every internal variable is determined. Then 6 is equiva- 
lent to a well-defined formula without quantifiers ',V', 4,V, 
ai y°. 


Proof. By Lemma|84] using (7). 7 


Lemma 86 Let wu be an undetermined composed non- 
parameter term variable in a structural base formula 3 such 
that u is a source t.e. no conjunct of form 


u! = f(ur,... 


occurs in termBase. Let 3’ be the result of dropping u from 
B. Then B is equivalent to p’. 


pUy +++) UR) 


Proof. Because u is a composed non-parameter term 
variable, it does not occur in innerMap, so it only occurs 
in termBase and termHom. The conjunct containing u in 
termHom is a consequence of the remaining conjuncts, so it 
may be dropped. After that, applying yields a structural 
base formula @’ not containing u, where (’ is equivalent to 


bo. - 


Lemma 87 Let u’ be an undetermined composed non- 
parameter leafset variable in a structural base formula 3 such 
that ub is a source i.e. no conjunct of form 
L’ bye L L 
=f (Ui ys ee 54h snout) 
occurs in leafsetBase. Let 6’ be the result of dropping ub 
from 8. Then @ is equivalent to 2’. 


Proof. Because u’ is a composed non-parameter term 


variable, it does not occur in innerMap or cardin, so it only 
occurs in leafsetBase and leafsetHom. The conjunct con- 
taining u' in leafsetHom is a consequence of the remaining 
conjuncts, so it may be dropped. After that, applying 
yields a structural base formula @’ not containing u', where 
2’ is equivalent to 3. m= 


Corollary 88 Every base formula is equivalent to a base 
formula without undetermined composed non-parameter 
term variables and without undetermined composed non- 
parameter leafset variables. 


Proof. If a structural base formula has an undetermined 
composed non-parameter term variable, then it has an un- 
determined composed non-parameter term variable that is 
a source, similarly for leafset variables. By repeated appli- 
cation of Lemma [86] and Lemma we eliminate all unde- 
termined non-parameter term and leafset variables. m 


The following Proposition corresponds to Proposi- 


tion and Proposition 


Proposition 89 (Struct. Base to Quantifier-Free) 
Every structural base formula 3 is equivalent to a well- 
defined simple formula & without quantifiers 3',V', 3,V, 
eae 


Proof Sketch. By Corollary we may assume that 
(G has no undetermined composed non-parameter term and 
leafset variables. By Corollary we are done if there are 
no undetermined variables, so it suffices to eliminate: 


1. undetermined parameter term variables, 

2. undetermined primitive non-parameter term variables, 
3. undetermined parameter leafset variables, 
4 


. undetermined primitive non-parameter leafset vari- 
ables, and 


5. undetermined shape variables. 


If u is an undetermined parameter term variable or a prim- 
itive non-parameter term variable, then wu does not occur in 
termBase, so it occurs only in termHom and innerMap. If 
ul is an undetermined parameter leafset variable or a prim- 
itive non-parameter leafset variable then ub does not occur 
in leafsetBase, so it occurs only in leafsetHom, innerMap, and 
cardin. 

For a undetermined term or leafset variable of shape u® 
such that there is an uncovered parameter or primitive non- 
parameter term or leafset variable with shape u*, consider 
all conjuncts 7; in innerMap of form 


L 
ub = oul, way, 


J -,Uix) 


and all conjuncts 6; from cardin of form: 


|t' (urea, ve Unt )[us =k 
or 
le" (ura, ee . Unt) as >k 


Together with formulas from termHom and leafsetHom that 
contain term and leafset variables free in formulas y; and 6;, 
these conjuncts form a formula 7 which expresses a relation 


in the substructure of term-power algebra which (because 
constructors are covariant) is isomorphic to a term-power 
of C. We therefore use Feferman-Vaught theorem from Sec- 
tion [3.3] to eliminate all term and parameter variables from 
n. By repeating this process we eliminate all undetermined 
parameter and leafset variables. 

It remains to eliminate undetermined shape variables. 
This process is similar to term algebra quantifier elimina- 
tion in Section An essential part of construction in 
Section[3.4]is Lemma|25| which relies on the fact that unde- 
termined parameter variables may take on infinitely many 
values. We therefore ensure that undetermined parameter 
shape variables are not constrained by term and parame- 
ter variables through conjuncts outside shapeBase. An un- 
determined parameter shape variable u*° does not occur in 
termHom or leafsetHom because there are no parameter term 
and leafset variables, so u®° can occur only in innerMap and 
cardin. 

However, because undetermined parameter and leafset 
variables are eliminated from the formula, if u* is a parame- 
ter shape variable then exactly one of these two cases holds: 


1. there are some conjuncts in innerMap and cardin that 
contain u® and contain some determined term and leaf- 
set variables, in this case u® is determined, or 


2. there are no conjuncts in innerMap containing u*® and 
cardin contains only domain cardinality constraints of 
form |1|us = & and |1)us > k. 


Hence, if u® is a shape variable it remains to eliminate the 
constraints of form |1|us = & and |1|us > k. We eliminate 
these constraints as in the proof of Proposition [66] 

In the resulting formula all variables are determined. By 
Corollary|85|the formula can be written as a formula without 
quantifiers 3¢,V', 5,V, 3°, V°. = 


The following is the main result of this paper. 


Theorem 90 (Term Power Quant. Elimination) 
There exist algorithms A, B such that for a given formula 
o in the language of Figure [9 


a) A produces a quantifier-free formula ¢' in selector lan- 
guage 


b) B produces a disjunction ¢' of structural base formulas 


We also explicitly state the following corollary. 


Corollary 91 LetC be a structure with decidable first-order 
theory. Then the set of true sentences in the logic of Figure[9| 
interpreted in the structure P according to Figures [19 and 
is decidable. 


6.5 Handling Contravariant Constructors 


In this section we discuss the decidability of the %-term- 
power structure for a decidable theory C when some of the 
function symbols f € % are contravariant. We then sug- 
gest a generalization of the notion of variance to multiple 
relations and to relations with arity greater than two. 

The modifications needed to accommodate contravari- 
ance with respect to some distinguished relation symbol 
<eé R for the case of infinite C are analogous to the modifi- 
cations in Section [5.5] We this obtain a quantifier elimina- 
tion procedure for any decidable theory C in the presence of 
contravariant constructors. 


Theorem 92 (Decidability of Structural Subtyping) 
Let C be a decidable structure and P a X-term-power of C. 
Then the first-order theory of P is decidable. 


In the rest of this section we consider a generalization 
that allows defining variance for every relation symbol r € R 
of any arity, and not just the relation symbol <e€ R. 

For a given relation symbol r € R, function symbol 
f € &, with k = ar(f), and integer i where 1 < i < k, 
let P,(f,2) denote a permutation of the set {1,...,k} that 
specifies the variance of the i-th argument of f with respect 
to the relation r. For example, if r is a binary relation then 
P,(f,%) is the identity permutation {(1,1)(2,2)} if ¢-th ar- 
gument of f is covariant, or a the transpose permutation 
{(1, 2), (2,1)} if i-th argument of f is contravariant. 

If 1 € leaves(s) is a leaf 1 = (f',i')...(f", i”), define the 
permutation variance(/) as the composition of permutations: 


variance(1) = P,(f",i") 0---0 P,(f',i') 


Then define [r] by 


Ir] (s,t1,..-,tk) = 
(s,{U| [r]° (tor [d],-- + tpg [d]) A 


(p1,.++;Pk) = variance(1) 


We generalize by defining 
N,x(s) = |{l € leaves(s) | variance(l) = 7}| 


As in Section we can transform the constraints 
|1lus = k and |1|us > k on each parameter shape variable 
into a conjunction of constraints of form: 


N,(u*) =k 


or 


N,(u) >k 


A problem on nonnegative integers. To solve the 
problem of variance with any number of relation symbols of 
any arity, it suffices to solve the following problem on sets 
of tuples of non-negative integers. 

Let Nat = {0,1,2,...}. Consider the structure St = 
Nat” for some d > 2 and let D = {1,2,...,d}. Ifpisa 
permutation on D, let M, denote an operation St — St 
defined by 


Mop (Gay iss5 Ba) = (pps .G pz) 
If (a1,...,@a), (yi,---, Ya) © St define 
(1,..-,a) + (ya,---, Ya) = (1 + yi,..-, La + Ya) 


Consider a finite set of operations f : St* — St where each 
operation f is determined by & permutations pi, ee ph in 
the following way: 
f (ti, a ite) = M, 3 (ta) +...4+M f (tk) 
1 PR 
Hence, each operation f of arity k is given by a permutation 
which specifies how to exchange the order of arguments in 


the tuple. After permuting the arguments the tuples are 
summed up. 
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Given a finite set F' of operations f, let S be the set 
generated by operations in F' starting from the element 
(1,0,...,0) € St. Let C(mi,...,na) be a conjunction of 
simple linear constraints of the forms 


m= ay 


and 
Ni = Ai 


Consider the set 


Ac = {(ni,...,na) € S|C(m1,...,na)} 

The problem is: For given set of operations F’, is there an 
algorithm that given C(m1,...,na) finitely computes the set 
Ac. 

End of a problem on nonnegative integers. 

We conjecture that the technique of Lemma [68] can be 
generalized to yield a solution to the problem on nonnegative 
integers and thus establish the decidability for the notion of 
variance with respect to any number of relations with any 
number of arguments. 


6.6 A Note on Element Selection 


We make a brief note related to the choice of the language 
for making statements in term-power algebras. In Section J] 
we avoided the use of leafset variables by substituting them 
into cardinality constraints. In this section we use a cylindric 
algebra of leafsets. 

An apparently even more flexible alternative is to allow 
the element selection operation 


select :: term x leaf — elem 


where elem is a new sort, interpreted over the set C, and 
leaf is a sort interpreted over the set of pairs of a shape and 
a leaf. Instead of the formula 


rus(t1,..-,tn) = truel,s 


we would then write 


VI. rus (select(ti,1),...,select(tn,1)) =" trues 


Using select operation we can define update relation: 
update(ti,lo,e,t2) = 
Vl. (1 =lo A select(t2,1) =e) V 
(lL Alo A select(t2,l) = select(t:,/))) 


The resulting language is at least as expressive as the lan- 
guage in Figure This language is interesting because it 
allows reasoning about updates to leaves of a tree of fixed 
shape, thus generalizing the theory of updatable arrays 
to the theory of trees with update operations, which would 
be useful for program verification. We did not choose this 
more expressive language in this report for the following 
reason. 

If the base structure C has a finite domain C’, then for 
certain reasonable choice of the relations interpreting Lc it 
is possible to express statements of this extended language 
in the logic of Figure[9] The idea is to assume a partial order 
on the elements of C with a minimal element, and use terms 
t with exactly one leaf non-minimal to model the leaves. 


On the other hand, in the more interesting case when C 
is infinite, we can easily obtain undecidable theories in the 
presence of selection operation. Namely, the selection oper- 
ation allows terms to be used as finite sets of elements of C. 
The term-power therefore increases the expressiveness from 
the first-order theory to the weak monadic second-order the- 
ory, which allows quantification over finite sets of objects. 
Weak monadic theory allows in particular inductive defini- 
tions. If theory of structure C' is decidable, weak monadic 
theory might therefore still be undecidable, as an example 
we might take the term algebra itself, whose weak monadic 
theory would allow defining subterm relation, yielding an 
undecidable theory [56] Page 508]. 


7 Some Connections with MSOL 


This section explores some relationships between the the- 
ory of structural subtyping and monadic second-order logic 
(MSOL) interpreted over tree-like structures. We present 
it as a series of remarks that are potentially useful for un- 
derstanding the first-order theory of structural subtyping of 
recursive types, see [36] [37] for similar results in the context 
of the theory of feature trees. 

In Section [7.1] we exhibit an embedding of MSOL of in- 
finite binary tree into the first-order theory of structural 
subtyping of recursive types with two constant symbols a,b 
and one covariant binary function symbol f. MSOL of infi- 
nite binary tree is decidable. Although the embedding does 
not give an answer to the decidability of the structural sub- 
typing of recursive types, it does show that the problem is at 
least as difficult as decidability of MSOL over infinite trees. 
We therefore expect that, if the theory of structural subtyp- 
ing of recursive types is decidable, the decidability proof will 
likely either use decidability of MSOL over infinite trees, or 
use directly techniques similar to those of [18] [57]. 

In Section [7-2] we use the embedding in Section to 
argue the decidability of formulas of the first-order theory 
of structural subtyping of recursive types where variables 
range over terms of certain fixed infinite shape se. 

In Section [7.3] we present an encoding of all terms using 
terms of shape s-. We argue that the main obstacle in us- 
ing this encoding to show the decidability of the first-order 
theory of structural subtyping recursive types is inability to 
define the set of all prefix-closed terms of the shape s¢. 

In Section|7.4}we generalize the decidability result of Sec- 
tion[7.2|by allowing different variables to range over different 
constant shapes. 

In Section we illustrate some of the difficulties in 
reducing first-order theory of structural subtyping to MSOL 
over tree-like structures. We show that if we use a certain 
form of infinite feature trees instead of infinite terms, the 
decidability follows. 

In Section [7.6] we point out that monadic second-order 
logic with prefix-closed sets is undecidable, which follows 
from [48]. This fact indicates that if we hope to show the 
decidability of structural subtyping of recursive types, it is 
essential to maintain the incomparability of types of different 
shape. 


7.1 Structural Subtyping Recursive Types 


In this section we define the problem of structural sub- 
typing of recursive types. We then give an embedding of 
MSOL of the infinite binary tree into the first-order theory 
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of structural subtyping of infinite terms over the signature 
y= {a,b,g} with the partial order <. 

We define MSOL over infinite binary tree [6] Page 317] 
as the structure MSOL” = ({0, 1}*,succo, succi). The do- 
main of the structure is the set {0,1}* of all finite strings 
over the alphabet {0,1}. We denote first-order variables by 
lowercase letters such as x,y,z. First-order variables range 
over finite words w € {0,1}*. We denote second-order vari- 
ables by uppercase letters such as X,Y,Z. Second-order 
variables range over finite and infinite subsets S C {0,1}*. 
The only relational symbol is equality, with the standard in- 
terpretation. There are two function symbols, denoting the 
appending of the symbol 0 and the appending of the symbol 
1 to a word: 


succow = w-O0 


succyw = w-l 


For the purpose of embedding into the first-order theory 
of structural subtyping, we consider a structure MSOL“ = 
({0, 1}*, S, Succo, Succ;) equivalent to MSOL"?). We use the 
language of MSOL without first-order variables to make 
statements within MSOL™. C is a binary relation on sets 
denoting the subset relation: 


1 © ¥2 Va. x€ Yj re Yo 


Succo and Succ; are binary relations on sets, Succo, Succ; C 
Q{O1b" y {0.1} , defined as follows: 


Succo(¥i, Y2) 


Succi (Yi, Y2) 


<= y={w-0|weYi} 
—= y={w-llweYyi} 


The structure MSOL? is similar to one in [18]; the dif: 
ference is that relations Succo and Succ, are true even for 
non-singleton sets. 

Lemmas and show the expected equivalence of 
MSOL) and MSOL). 


Lemma 93 (MSOL”? expresses MSOL") Every — rela- 
tion on sets definable in MSOL™ is definable in MSOL®. 


Proof. We express relations C, Succo, Succ; as formulas in 
MSOL®) , as follows. We express Yi C Y2 as 


Va. Y1(x) = Yo(a), 


Succo(Yi, Y2) as 


ja] 


Vau.Yo(x) <> dy.y = succo(z), 


and Succ, (Yi, Y2) as 
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Vau.Yo(z) <> dy-y = succi(z). 


The statement follows by induction on the structure of for- 
mulas. m 


Let RC (213°) x ({0,1}*)” be relation of arity k +n. 
Define R* C (2001}")F yx (Qf 1°)” by 
R*(¥Y1,..-, Yn, X1,---, Xn) = 
dari,...,%n. X1 = {xi} A---A Xn = {an} A 
R(Y1,.--, Ve, @1,--- 


Mn) 


Lemma 94 (MSOL”) expresses MSOL®)) Jf R is defin- 
able in MSOL®), then R* is definable in MSOL™. 


Proof Sketch. Property of being an empty set is definable 
in MSOL™ by the formula 


do(V1) = VY¥2.¥i © Vo 


The relation C of being a proper subset is definable in 
MSOL” by formula 


$1(%1,Y2) = %1 S¥2AN 4 Ye 


and the relation C; of having one element more is definable 
by formula 


$2(%1,Y2) = % CY¥Y2A7dZ.Y%1 CZAZCY2 


The property of being a singleton set can then be expressed 
by formula 


o3(¥i) = JYo. do(Yo) A Yo Ci Yi 


We define the relation on singletons corresponding to succo 
by 


4(¥1, Y2) = $3(¥1) A b3(¥2) A Succo(%1, Y2) 


Similarly, the relation corresponding to succ; is defined by 
65(¥1, Y2) = 3(¥1) A bs(¥2) A Succi (¥1, Y2) 


If R is expressible by some formula 7 in MSOL”), then R is 
expressible by a formula in prenex normal form, so suppose 
w is of form 


QiV...QnVn-Wo 


where wo is quantifier free. We construct a formula 7’ ex- 
pressing R* in MSOL”). We obtain the matrix wW of w’ 
by translating wo as follows. If x is a first-order variable in 
wo, we represent it with a second-order variable X denot- 
ing a singleton set. We replace membership relation Y (2) 
with subset relation X C Y. We replace succo with da and 
succ; with ¢5. We construct w’ by adding quantifiers to 
wo as follows. Second-order quantifiers remain the same. 
First-order quantifiers are relativized to range over single- 
ton sets: Vx.1); becomes VX.63(X) => wi and Ax.1); becomes 
IX. 3(X) Agi (X). 


We can view MSOL“) as a first-order structure with the 
domain 2°". We show how to embed MSOL" into the 
first-order theory of structural subtyping. 

We define the first-order structure of structural subtyp- 
ing of recursive types similarly to the corresponding struc- 
ture for non-recursive types in Section [4] the only difference 
is that the domain contains both finite and infinite terms. 
Infinite terms correspond to infinite trees [12] [30]. 

We define infinite trees as follows. We use alphabet {I,r} 
to denote paths in the tree. A tree domain D is a finite or 
infinite subset of the set {1,r}* such that: 


1. D is prefix-closed: if w € {l,r}*, x € {l,r} then 
w-ax € D implies w € D; 


2. if w € D then exactly one of the following two proper- 
ties hold: 


(a) w is an interior node: {w-l,w-r} CD 
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(b) w isa leaf: {w-l,w-r}ND=9%. 


A tree with a tree domain D is a total function T from the 
set of leaves of D to the set {a,b}. 

Note that the tree domain D of a tree T can be recon- 
structed from T as the prefix closure of the domain of the 
graph of function T; we write TDom(T) for the tree domain 
of tree T. 

Two trees are equal if they are equal as functions. Hence, 
equal trees have equal function domains and equal tree do- 
mains. 

We say that JT; < T> iff TDom(T,) = TDom(72) and 
Ti(w) <o T2(w) for every word w € TDom(T7;). Here <o is 
the relation {(a, a), (a, b), (b, b)}. 

If T; and T> are trees, then g(T71, T2) denotes the tree T 
such that 


TDom(T) = {l-w|we Ti} U{r-w | w € To} 
T(l-w) = Ti(w), ifwe Ty 
T(r-w) = Ta(w), if w € To 
Let IT denote the set of all infinite trees. The structural 
subtyping structure is the structure SIT = (IT,g,a,b,<). 
SIT is an infinite-term counterpart to the structure BS from 
Section [4] 


Similarly to the case of finite terms, define the relation 
~ of “being of the same shape” in SIT by 


ty ~ to = Sto. to <ti Ato < te 


Observe that t1 ~ te iff TDom(t:) = TDom(t2). 

We next present an embedding 1 of MSOL™) into SIT. 
The image of the embedding » are the infinite trees that are 
in the same ~-equivalence-class with the tree te. We define 
te as the unique solution of the equation: 


te= g(g(te; te), a) 


Trees in the ~-equivalence class of te have the tree domain 
D = TDom(t.) given by the regular context-free grammar 


D-e|r|l|irD|UD 


whereas the leaves L of D are given by the context-free gram- 
mar 
L+e|r|irL| Ub 


or the regular expression (Ir|ll)*r. Let h be the homomor- 
phism of words from {0,1}* to {1,r}* such that 


n(0) = U 
h(1) = Ir 


If w = a,...G@p is a word, then w® denotes the reverse of 
the word, we =an...a1. 

We define the embedding z to map a set Y C {0,1}* into 
the unique tree t such that t ~ t- and for every w € {0,1}"*, 


weY —> T(h(w®)-r)=6 (103) 


Observe that 1(0) = te. Define formulas TSucco(ti,t2) and 
TSucci (ti, t2) as follows: 


TSucco(t1, t2) 
TSucci(ti,t2) = 


| 

~s 
1) 

| 
Ss 
— 
Ss 
— 

oH 
eR 

es 
o 
Nee 

+ 
o 
Ne 


It is straightforward to show that v is an injection and that 
t maps relation C into <, relation Succo into TSucco, and 
relation Succ, into TSucc,. Moreover, the range of u is the 
set of all terms t such that sh(t) = s- where se = sh(te). 


7.2 A Decidable Substructure 


Section|7.1]shows that terms of shape s- form a substructure 
within SIT that is isomorphic to MSOL™). In this section 
we consider the following converse problem. 

Consider the formulas BF that, instead of quantifiers 
4,V, contain bounded quantifiers 4.,V- that range over the 
elements of the set 


Te = {t | sh(t) = se} 


We show that the set of closed formulas from BF that are 
true in SIT is decidable. 

Although the quantifiers are bounded, terms in this logic 
can still denote elements of shape other than s,. For exam- 
ple, the in the atomic formula 


g(x1, £2) < g(x3, 9(g(@4, £5), b)) 


the term g(x1,x2) denotes a term of the shape g*(Sc, Se). 
First we show that all atomic formulas are of one of the 
following forms: 


1. to = g(g(a1, 2), a); 


2. to = g(g(x1, £2), b); 
3. 01 = £23 
4. 1 < 2. 


Consider an atomic formula t; = tg. The key idea is that if 
sh(ti) 4 sh(t2) then the formula t; = te is false. 

If none of the term ft; and tg is a variable then one of them 
is a constant or a constructor application. If ti = g(ti1, ti2) 
then either t; = tg is false or tg = g(t21, t22) for some tay, too. 
We may therefore decompose t; = tz into ti; = tea and 
tig = teg. By repeating this decomposition we arrive at 
terms of form t, = te where both t; and tz are constants or 
at the equality of form x = t(21,...,2%n). The equalities 
between the constants can be trivially evaluated. This leaves 
only terms of form wo = t(a1,...,@n). Let #(a},...,2%,) be 
a shape term that results from replacing a and b with c and 
replacing g with g* in t. Because all variables range over Te, 
we conclude that xo = t(%1,...,%n) can be true only if 


Ge = (Se)... 258e) 
If t(a1,...,¢n) € {a,b} is then (7.2) is false. If 
t(a@1,..-,%n) = 41, we obtain formula of the desired form. 


g(te1,t22). Then sh(te1) = 
Therefore, t21 = g(ta11, t212) 


So assume t(21,...,%n) = 
g°(Se, Se) and sh(te2) = c. 


where either sh(tei1) = sh(to12) = se or ti = te is 
false. Similarly, either t22 € {a,b} or ti = te is false. 
Therefore, t(21,...,%n) = g(g(te11, ta12), a), t(@1,...,2n) = 
g(g(t211, t212),b), or ti = te is false. If t(m1,...,¢n) = 


g(g(t211, t212),a) then we may replace the t; = t2 with the 
formula 


eY1, 42. Lo = g(g(y1, y2),@) A yr = tar A yo =tore 


LW 


and similarly in the other case. By continuing this process 
by the induction on the structure of the term t(x1,...,2n) 
we either conclude that ti = tg is false, or we conclude that 
ti = te is equivalent to a conjunction of formulas of the 
desired form. 

Conversion of atomic formula of form t; < t2 is analogous 
to the conversion of formulas t; = te. 
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To see the decidability it now suffices to convert 
the formulas of the form zo = = g(g(#1,2%2),a) and 
Zo = g(g(x1,%2),b) into formulas TSucco(ti,t2) and 
TSucci(t1,¢2). Expressibility of to = g(g(x1,22),a) fol- 
lows from the fact that the following relationship between 
Xo, X1, X2 is expressible in MSOL: 


Xo ={w- 0] we Xi}U{w- 1] we Xo} 


Similarly, the expressibility of to = g(g(x1,22), 6) follows 
from the fact that 


Xo ={w-0| we Xi}U{w-1] we Xo} U {e} 
is expressible in MSOL. We conclude that the set of closed 
BF formulas that are true in SIT is decidable. 


7.3 Embedding Terms into Terms 


We next give an embedding of the set of all terms into Te. 
As in Section te be the unique solution of the equation 
e = 9(g(te, te), a) and let 


ta(@1, T2,03, £4) = g(g(9(g9(21, £2), ©3), te), r4) 


Define 
ta = ta(te, te, a, a) 


ta(te, te, a, b) 


to 
tg(x1, 22) = ta(x1, 2, b, b) 


Then define the homomorphism hr from the set of all terms 
to the set T. by 


hr(a) = ty 
hr(b) = ty 
hr(g(ti,t2)) = tg(hr(tr), hr (te) 


Then hr is embedding of the set of all terms into the subset 
subset 7. of all terms. The term algebra operations a, b,g 
map to ta,t»y,t, and < maps to <. 

Note that, if it were possible to define a predicate P(t) 
such that 


P(x) <> Ay-Arly) =2 (104) 


then we could express all statements of SIT within the BF 
subtheory, and therefore SIT would be decidable. 

The fundamental problem with specifying P(«) is not the 
use of two bits to encode the three possible elements {a, b, g}, 
but the constraint that if a term contains a subterm of the 
form ta(t1, t2,a,a) or ta(ti, t2, a,b) at some even depth, then 
ti = tg = t-. Compared to the relationships given by con- 
structor g, this constraint requires taking about successor 
relation at the opposite side of the paths within a tree, see 
Section 


7.4 Subtyping Trees of Known Shape 


We next argue that if we allow the logic to have a copy of 
bounded quantifiers 4;,V; for every constant shape s, we 
obtain a decidable theory. To denote constant shapes in a 
finite number of symbols we consider in addition to term 
algebra symbols g*, c’ the expressions that yield solutions of 
mutually recursive equations on shapes; the details of the 
representation of types are not crucial for our argument, see 
e.g. 


Consider a closed formula in such language. Because ev- 
ery variable has an associated constant shape, we can com- 
pute the set of all shapes occurring in the formula. This 
means that all variables of the formula range over a finite 
known set of shapes. This allows us to define the predicate 
P given by (104) as a disjunction of cases, one case for ev- 
ery shape. Define hmin, hmax functions that take a shape and 
produce a lower and upper bound for terms of that shape: 


heglet: & de 

Pmin(g?(ti,t5)) = tg(Amin(ti), Pmin(t3)) 
Rmax(c*) = 6 

Pmax(g°(ti,#3)) =  tg(Rmax(ti), Rmax(t3)) 


If s1,...,5n is the list of shapes occurring in a formula, we 
then define a predicate P specific to that formula by 


P(t) = \/ (hmin(8i) <tAt < Pmax(t)) 


i=1 


We can therefore define P(t) and use it to translate the 
formula into a BF formula of the same truth value. There- 
fore, structural subtyping with quantification bounded to 
constant shapes is decidable. 

For decidability of the structural subtyping recursive 
types it would be interesting to examine the decision proce- 
dure for MSOL and determine whether there is some unifor- 
mity in it that would allow us to handle even quantification 
over shapes that are determined by variables. 


7.5 Recursive Feature Trees 


We next remark that certain notion of subtyping of recursive 
feature trees is decidable. By a feature tree we mean an infi- 
nite tree built using a constructor which takes other feature 
trees and an optional node label as an argument. In this sec- 
tion we consider the simple case of one binary constructor f 
and assume only one label denoted by 1. Hence, an empty 
feature tree is a feature tree, and if t1 and te are feature trees 
then so are f*(t1,t2) and f' (ti, te). We represent an empty 
feature tree e by an infinite tree that has all features e. We 
compare feature trees as follows. Let < be defined on the 
features {e, 1} as the relation {(e, €), (e,1),(1,1)}. Define < 
on trees as the least relation such that: 


1. e<¢t for all terms t; 
2. ti < t) and te < t4 implies 
f"\(t1, ta) < f"? (th, t2) 
for all r1,r2 € {e, 1} such that r1 < re. 


The decidability of feature trees follows from Section[7-1] 
because of the isomorphism hr between the set of terms Te 
and the set of feature trees. Here hr is defined by: 


hr(e) = te 
he(f*(h,t2)) = g(hr(ti), he(te), a) 
he(f'(ti,t2)) = g(hr(ti), hr(te), 6) 


The feature trees as we defined them have a limited fea- 
ture and node label alphabet. This is not a fundamental 
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problem. Muchnik’s theorem [57] gives the decidability of 
MSOL of trees over arbitrary decidable structures. It is 
reasonable to expect that the decidability of MSOL over 
decidable structures yields a generalization of the result of 
Section and therefore the decidability of feature trees 
with a richer vocabulary of features. 

The crucial property of our definition of feature trees is 
that features can appear in any node of the tree. Hence, 
there are no prefix closure requirements on trees as in Sec- 
tion|7.3] which is responsible for relatively simple reduction 
to MSOL. 


7.6 Reversed Binary Tree with Prefix-Closed Sets 


It is instructive to compare the difficulties our approach 
faces in showing the decidability of structural subtyping of 
recursive types with the difficulties reported in [48]. In 
Section 5.3] the authors remark that the difficulty with ap- 
plying tree automata is that the set x = f(y, z) is not reg- 
ular. By reversing the set of paths in a tree representing 
a term we have shown in Section [7-1] that the relationship 
x = f(y, z) becomes expressible. However, the difficulty now 
becomes specifying a set of words that represents a valid 
term, because there is no immediate way of stating that a 
set of words is prefix-closed. If we add an operation that 
allows expressing relationship at both “ends” of the words, 
we obtain a structure whose MSOL is undecidable due to 
the following result [52] Page 183]. 


Theorem 95 MSOL theory of the structure with two suc- 
cessor operations w-0 and w-1 and one inverse successor 
operation 0- w is undecidable. 


The case that is of interest of us is the dual to Theorem 
under the word-reversing isomorphism: a structure with op- 
erations 0-w, 1-w, w-0O has undecidable MSOL closed 
formulas. 

Instead of expressing prefix-closure using operations w-0, 
w-1, let us consider MSOL over the structure that contains 
only operations 0- w and 1-w, but where all second-order 
variables range over prefix-closed sets. This logic also turns 
out to be undecidable. 

Let PCI be the set of prefix-closed sets. For each word w, 
there exists the smallest PCI set containing w, namely the 
set C(w) given by: 


C(w) = {w’ | w’ = w} 


Every subset of C(w) in PCI is a of the form C(w1) for some 
word wi. Define PSucco and PSucc; on PCI by: 


Ww 


PSucco(X1,X2) = 
PSucci(X1,X2) = 


w. X1 = O(w) A Xo = C(0-w) 
Ww. X1 = C(w) A X2 = C(1- w) 


Ww 


Consider a monadic theory PrefT with relations PSuccy and 
PSucc; where second-order variables range over the subsets 
of PCI. It is easy to see that PrefT corresponds to the first- 
order theory of non-structural subtyping of recursive types, 
with subset relation C corresponding to subtype relation <, 
empty set corresponding to the least type L, PSucco(X1, X2) 
corresponding to X2 = f(X1,-L), and PSucc;(X1, X2) cor- 
responding to X2 = f(L,X2). The first-order theory of 
non-structural subtyping was shown undecidable in |48], so 
PrefT is undecidable. An interesting open problem is the de- 
cidability of fragments of the first-order theory of structural 


subtyping. This problem translates directly to the decid- 
ability of the fragments of PrefT, a monadic theory with 
prefix-closed sets, or, under the word-reversal isomorphism, 
the decidability of fragments of the monadic theory of two 
successor symbols with suffix-closed sets. 


8 Conclusion 


In this paper we presented a quantifier elimination proce- 
dure for the first-order theory of structural subtyping of 
non-recursive types. Our proof uses quantifier elimination. 
Our decidability proof for the first-order theory of structural 
subtyping clarifies the structure of the theory of structural 
subtyping by introducing explicitly the notion of shape of a 
term. 

We presented the proof in several stages with the hope of 
making the paper more accessible and self-contained. Our 
result on the decidability of -term-power is more general 
than the decidability of structural subtyping non-recursive 
types, because we allow even infinite decidable base struc- 
tures for primitive types. We view this decidability result 
as an interesting generalization of the decidability for term 
algebras and decidability of products of decidable theories. 
This generalization is potentially useful in theorem proving 
and program verification. 

Of potential interest might be the study of axiomatiz- 
ability properties; the quantifier elimination approach is ap- 
propriate for this purpose [31] [30], we did not pay much at- 
tention to this because we view the language and the mech- 
anism for specifying the axioms of secondary importance. 

Our goal in describing quantifier elimination procedure 
was to argue the decidability of the theory of structural sub- 
typing. While it should be relatively easy to extract an algo- 
rithm from our proofs, we did not give a formal description 
of the decision procedure. One possible formulation of the 
decision procedure would be a term-rewriting system such as 
[11]; this formulation is also appropriate for implementation 
within a theorem prover. Our approach eliminates quanti- 
fiers as opposed to quantifier alternations. For that purpose 
we extended the language with partial functions. The use of 
Kleene logic for partial functions seems to preserve most of 
the properties of two valued logic and appears to agree with 
the way partial functions are used in informal mathematical 
practice. An alternative direction for proving decidability 
of structural subtyping would be to use Ehrenfeucht-Fraisse 
games [53] Page 405]; [15] uses techniques based on games 
to study both the decidability and the computational com- 
plexity of theories. 

The complexity of our the decidability for structural sub- 
typing non-recursive types is non-elementary and is a conse- 
quence of the non-elementary complexity of the term alge- 
bra, whose elements and operations are present in the theory 
of structural subtyping. Tools like MONA show that 
non-elementary complexity does not necessarily make the 
implementation of a decision procedure uninteresting. An 
interesting property of quantifier elimination is that it can 
be applied partially to elimination an innermost quantifier 
from some formula. This property makes our decision pro- 
cedure applicable as part of an interactive theorem prover 
or a subroutine of a more general decision procedure. 

In this paper we have left open the decidability of struc- 
tural subtyping of recursive types, giving only a few remarks 
in Section [7] In particular we have observed in Section [7-1] 
that every formula in the monadic second-order theory of the 
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infinite binary tree [6] Page 317] has a corresponding formula 
in the first-order theory of structural subtyping of recursive 
types. In that sense, the decision problem for structural 
subtyping recursive types is at least as hard as the decision 
problem for the monadic second-order logic interpreted over 
the infinite binary tree. This observation is relevant for two 
reasons. 

First, it is unlikely that a minor modification of the quan- 
tifier elimination technique we used to show the decidabil- 
ity of structural subtyping non-recursive types can be used 
to show the decidability of recursive types. Because of the 
embedding in Section[7.1]|such a quantifier-elimination proof 
would have to subsume the determinization of tree automata 
over infinite trees. 

Second, the embedding suggests even greater difficulties 
in implementing a decision procedure for the first-order the- 
ory of structural subtyping (provided that it exists). While 
we know at least one interesting example of weak monadic 
second-order logic decision procedure, namely [25] we are 
not aware of any implementation of the full monadic second- 
order logic decision procedure for the infinite tree. 

The relationship between the non-structural as well as 
structural subtyping and monadic second-order logic of the 
infinite binary tree and tree like structures requires fur- 
ther study. In that respect the work on feature trees [36] [37] 
appears particularly relevant. 
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